{"id":569,"date":"2024-10-18T02:45:56","date_gmt":"2024-10-18T02:45:56","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=569"},"modified":"2025-01-22T16:42:55","modified_gmt":"2025-01-22T16:42:55","slug":"9-4-confidence-intervals-for-means","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/9-4-confidence-intervals-for-means\/","title":{"raw":"9.4: Confidence Intervals for Means","rendered":"9.4: Confidence Intervals for Means"},"content":{"raw":"<div id=\"lobjh\" class=\"multi\">\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h2 class=\"textbox__title\">Learning Objectives<\/h2>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li id=\"explain_confidence_interval\">Explain what a confidence interval represents and determine how changes in sample size and confidence level affect the precision of the confidence interval.<\/li>\r\n \t<li id=\"find_confidence_intervals\">Find confidence intervals for the population mean and the population proportion (when certain conditions are met), and perform sample size calculations.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10AFC\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Overview<\/span><\/h2>\r\n<p id=\"N10B03\">As we mentioned in the introduction to interval estimation, we start by discussing interval estimation for the population mean \u03bc. Here is a quick overview of how we introduce this topic.<\/p>\r\n\r\n<ul>\r\n \t<li>Learn how a 95% confidence interval for the population mean \u03bc is constructed and interpreted.<\/li>\r\n \t<li>Generalize to confidence intervals with other levels of confidence (for example, what if we want a 99% confidence interval?).<\/li>\r\n \t<li>Understand more broadly the structure of a confidence interval and the importance of the margin of error.<\/li>\r\n \t<li>Understand how the precision of interval estimation is affected by the confidence level and sample size.<\/li>\r\n \t<li>Learn under which conditions we can safely use the methods that are introduced in this section.<\/li>\r\n<\/ul>\r\n<p id=\"N10B1A\">Recall the IQ example:<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10B21\">Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating \u03bc, the mean IQ level of all the students at SU.<\/p>\r\n<p id=\"N10B24\">We will assume that from past research on IQ scores in different universities, it is known that the IQ standard deviation in such populations is \u03c3 = 15. In order to estimate \u03bc , a random sample of 100 SU students was chosen, and their (sample) mean IQ level is calculated (let\u2019s not assume, for now, that the value of this sample mean is 115, as before).<\/p>\r\n<p id=\"N10B38\"><span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=100, represented by a smaller circle. In this sample, we need to find x bar\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image033.gif\" alt=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=100, represented by a smaller circle. In this sample, we need to find x bar\" \/><\/span><\/span><\/p>\r\nWe will now show the rationale behind constructing a 95% confidence interval for the population mean \u03bc.\r\n<p id=\"N10B41\">* We learned in the \u201cSampling Distributions\u201d module of probability that according to the central limit theorem, the sampling distribution of the sample mean [latex]\\overline{X}[\/latex] is approximately normal with a mean of \u03bc and standard deviation of [latex]\\frac{\\sigma}{\\sqrt{n}}[\/latex]. In our example, then, \u03c3 = 15 and n=100), the possible values of[latex]\\overline{X}[\/latex]&gt;, the sample mean IQ level of 100 randomly chosen students, is approximately normal, with mean \u03bc and standard deviation [latex]\\frac{15}{\\sqrt{100}}=1.5[\/latex]<\/p>\r\n<p id=\"N10BA8\">* Next, we recall and apply the Standard Deviation Rule for the normal distribution, and in particular its second part:<\/p>\r\n<p id=\"N10BAB\">There is a 95% chance that the sample mean we get in our sample falls within 2 * 1.5 = 3 of \u03bc.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A distribution curve with a horizontal axis labeled &quot;X bar.&quot; The curve is centered at X-bar=\u03bc, and marked on the axis is \u03bc+3 and \u03bc-3. There is a 95% chance that any x-bar will fall between \u03bc-3 and \u03bc+3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/2_iq.png\" alt=\"A distribution curve with a horizontal axis labeled &quot;X bar.&quot; The curve is centered at X-bar=\u03bc, and marked on the axis is \u03bc+3 and \u03bc-3. There is a 95% chance that any x-bar will fall between \u03bc-3 and \u03bc+3.\" height=\"350\" \/><\/span><\/span>\r\n<p id=\"N10BB5\">* Obviously, if there is a certain distance between the sample mean and the population mean, we can describe that distance by starting at either value. So, if the sample mean (<em><strong>x<\/strong><\/em>) falls within a certain distance of the population mean \u03bc, then the population mean \u03bc falls within the same distance of the sample mean.<\/p>\r\n<p id=\"N10BC8\">Therefore, the statement, \u201cThere is a 95%\u00a0<em>chance<\/em>\u00a0that the\u00a0<em>sample<\/em>\u00a0mean\u00a0[latex]\\overline{x}[\/latex]\u00a0falls within 3 units of \u03bc\u201d can be rephrased as: \u201cWe are 95%\u00a0<em>confident<\/em>\u00a0that the\u00a0<em>population<\/em>\u00a0mean \u03bc falls within 3 units of\u00a0[latex]\\overline{x}[\/latex].\u201d<\/p>\r\n<p id=\"N10BF7\">So, if we happen to get a sample mean of\u00a0[latex]\\overline{x}=115[\/latex], then we are 95% sure that \u03bc falls within 3 of 115, or in other words that \u03bc is covered by the interval (115 \u2013 3, 115 + 3) = (112,118).<\/p>\r\n<p id=\"N10C10\">(On later pages, we will use similar reasoning to develop a general formula for a confidence interval.)<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10C14\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"N10C1B\">Note that the first phrasing is about\u00a0[latex]\\overline{x}[\/latex], which is a random variable; that\u2019s why it makes sense to use probability language. But the second phrasing is about \u03bc, which is a parameter, and thus is a \u201cfixed\u201d value that doesn\u2019t change, and that\u2019s why we shouldn\u2019t use probability language to discuss it. This point will become clearer after you do the activities on the next page.<\/p>\r\n\r\n<div id=\"f12d872c2d2c4e50b7ba1628f01b94f2\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The General Case<\/span><\/h2>\r\n<\/div>\r\n<\/div>\r\n<p id=\"c4df7cfa0bd94e5a8b37bc87a6939387\">Let\u2019s generalize the IQ example. Suppose that we are interested in estimating the unknown population mean (\u03bc) based on a random sample of size n. Further, we assume that the population standard deviation (\u03c3) is known.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d78e18e3028c41d9ac4b9b29364108a2\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image036.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.\" \/><\/span><\/span>\r\n<p id=\"e0ed4af979a5434bb7fdcd3658ce130b\">The values of\u00a0[latex]\\overline{x}[\/latex]\u00a0follow a normal distribution with (unknown) mean \u03bc and standard deviation\u00a0[latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0(known, since both \u03c3 and n are known). By the (second part of the) Standard Deviation Rule, this means that:<\/p>\r\n<p id=\"a023622336d94ea1b45c32cb66a0f93a\">There is a 95% chance that our sample mean ([latex]\\overline{x}[\/latex]) will fall within\u00a0[latex]2*\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0of \u03bc,<\/p>\r\n<p id=\"db9a948eef784491aea4dd9f31d6b96c\">which means that:<\/p>\r\n<p id=\"f923f5db00184ea2b08a99a3cc19d0a0\">We are 95% confident that \u03bc falls within [latex]2*\\frac{\\sigma}{\\sqrt{n}}[\/latex] of our sample mean ([latex]\\overline{x}[\/latex]).<\/p>\r\n<p id=\"bc2ed2a5f4214858965eb025fbe3e203\">Or, in other words, a 95% confidence interval for the population mean \u03bc is:<\/p>\r\n[latex]\\left ( \\bar{x}-2*\\frac{\\sigma }{\\sqrt{n}},\\bar{x}+2*\\frac{\\sigma }{\\sqrt{n}} \\right )[\/latex]\r\n<p id=\"adaf3869fce4426ab8081751e3a299e6\">Here, then, is the\u00a0<em class=\"italic\">general result:<\/em><\/p>\r\n<p id=\"be60f7aa5f6e49e986dad5bf77868cbd\">Suppose a random sample of size n is taken from a normal population of values for a quantitative variable whose mean (\u03bc) is unknown, when the standard deviation (\u03c3) is given. A 95% confidence interval (CI) for \u03bc is:<\/p>\r\n[latex]\\bar{{x}}\\pm2*\\frac{\\sigma}{\\sqrt{n}}[\/latex]\r\n<div id=\"cbd10bf98c704267a9c424db1386f3ca\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"afdb9e3cc56944e2b3f514195af2ec1e\">Note that for now we require the population standard deviation (\u03c3) to be known. Practically, \u03c3 is rarely known, but for some cases, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that \u03c3 is known. Eventually, we will see how to proceed when \u03c3 is unknown, and must be estimated with sample standard deviation (s).<\/p>\r\n<p id=\"b1fe25389bf34e8bbbf719782fe336d0\">Let\u2019s look at another example.<\/p>\r\n\r\n<div id=\"f4d4b73fafe44f9586233109884ce00b\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"f1496d1bcc23488ba91939cd3c5b099f\">An educational researcher was interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution with the standard deviation\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-92\" class=\"mjx-math\"><span id=\"MJXc-Node-93\" class=\"mjx-mrow\"><span id=\"MJXc-Node-94\" class=\"mjx-mrow\"><span id=\"MJXc-Node-95\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><span id=\"MJXc-Node-96\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-97\" class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">100<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0.<\/p>\r\n<p id=\"f87612f13ef8410baafe9c1791a9990e\">Here is a visual representation of this story, which summarizes the information provided:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"c5be519f385c406fac212e13698dc0ef\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of Community college students in the researcher&amp;apos;s state. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n=650, represented by a smaller circle. We can find that x-bar=475 for this SRS.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image045.gif\" alt=\"A large circle represents the population of Community college students in the researcher&amp;apos;s state. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n=650, represented by a smaller circle. We can find that x-bar=475 for this SRS.\" \/><\/span><\/span>\r\n<p id=\"ae681d2b1a484e12bc4ac91b55da58a3\">Based on this information, let\u2019s estimate \u03bc with a 95% confidence interval.<\/p>\r\n<p id=\"ab817cc09c1b4870887714af028f693f\">Using the formula we developed before,\u00a0[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}[\/latex], a 95% confidence interval for \u03bc is:<\/p>\r\n[latex]\\left(475-2*\\frac{100}{\\sqrt{650}},475+2*\\frac{100}{\\sqrt{650}}\\right)[\/latex], which is (475 \u2013 7.8 , 475 + 7.8) = (467.2, 482.8). In this case, it makes sense to round, since SAT scores can be only whole numbers, and say that the 95% confidence interval is (467, 483).\r\n<p id=\"d6856c208fa0492e9b1448bb11e4c9f5\">We are not done yet. An equally important part is to\u00a0<em class=\"italic\">interpret what this means in the context of the problem.<\/em><\/p>\r\n<p id=\"c7ebd531c5d04c30a56f11c17b4b9968\">We are 95% confident that the mean SAT-M score of all community college students in the researcher\u2019s state is covered by the interval (467, 483). Note that the confidence interval was obtained by taking\u00a0<span id=\"MathJax-Element-12-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-139\" class=\"mjx-math\"><span id=\"MJXc-Node-140\" class=\"mjx-mrow\"><span id=\"MJXc-Node-141\" class=\"mjx-mrow\"><span id=\"MJXc-Node-142\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><span id=\"MJXc-Node-143\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00b1<\/span><\/span><span id=\"MJXc-Node-144\" class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">8<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0(rounded). This means that we are 95% confident that by using the sample mean (<span id=\"MathJax-Element-13-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-145\" class=\"mjx-math\"><span id=\"MJXc-Node-146\" class=\"mjx-mrow\"><span id=\"MJXc-Node-147\" class=\"mjx-mrow\"><span id=\"MJXc-Node-148\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-150\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00af<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-149\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">x<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-151\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-152\" class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><\/span><\/span><\/span><\/span>) to estimate \u03bc, our error is no more than 8.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"214\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<p id=\"c89204d503a248e9b66a31b04f3b5b50\">We just saw that one interpretation of a 95% confidence interval is that we are 95% confident that the population mean (\u03bc) is contained in the interval. Another useful interpretation in practice is that, given the data, the confidence interval represents the set of plausible values for the population mean \u03bc.<\/p>\r\n\r\n<div id=\"ca878b0fea5a4dafa894857dc7c0ccc5\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"be62bc4621b04dc1b1eafd9ac6361bb4\">As an illustration, let\u2019s return to the example of mean SAT-Math score of community college students. Recall that we had constructed the confidence interval (467, 483) for the unknown mean SAT-M score for all community college students.<\/p>\r\n<p id=\"d6b2be48039c40c4a16c214bae93bb85\">Here is a way that we can use the confidence interval:<\/p>\r\n<p id=\"c4adfa7de2564fa189996bb80c998e5c\">Do the results of this study provide evidence that \u03bc, the mean SAT-M score of community college students, is lower than the mean SAT-M score in the general population of college students in that state (which is 480)?<\/p>\r\n<p id=\"ca80c6a8b56c41088b398b22ddad7202\">The 95% confidence interval for \u03bc was found to be (467, 483). Note that 480, the mean SAT-M score in the general population of college students in that state, falls inside the interval, which means that it is one of the plausible values for \u03bc.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b94ae08c9ba845fb97233f2771718e6d\" class=\"img-responsive popimg aligncenter\" title=\"A number line, on which the 95% confidence interval for \u03bc has been marked, from 467 to 483. At 480 is the mean SAT-M score in the general population of college students in the state.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image050.gif\" alt=\"A number line, on which the 95% confidence interval for \u03bc has been marked, from 467 to 483. At 480 is the mean SAT-M score in the general population of college students in the state.\" \/><\/span><\/span>\r\n<p id=\"d39b6a2551694d519b42d10c72608c6e\">This means that \u03bc could be 480 (or even higher, up to 483), and therefore we cannot conclude that the mean SAT-M score among community college students in the state is lower than the mean in the general population of college students in that state. (Note that the fact that most of the plausible values for \u03bc fall below 480 is not a consideration here.)<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"f5d6bc2e92734e0d9ff7ce1659ec4114\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"f94118dc1e2b4b34a509356fdd6045ea\">Recall that in the formula for the 95% confidence interval for \u03bc,\u00a0[latex]\\bar{x}\\pm 2*\\frac{\\sigma }{\\sqrt{n}}[\/latex], the 2 comes from the Standard Deviation Rule, which says that any normal random variable (in our case\u00a0[latex]\\overline{X}[\/latex], has a 95% chance (or probability of 0.95) of taking a value that is within 2 standard deviations of its mean.<\/p>\r\n<p id=\"adfb8adcd0d14f9496cbfeb17e8e2870\">As you recall from the discussion about the normal random variable, this is only an approximation, and to be more accurate, there is a 95% chance that a normal random variable will take a value within 1.96 standard deviations of its mean. Therefore, a more accurate formula for the 95% confidence interval for \u03bc is [latex]\\bar{x}\\pm 1.96*\\frac{\\sigma }{\\sqrt{n}}[\/latex], which you\u2019ll find in most introductory statistics books. In this course, we\u2019ll use 2 (and not 1.96), which is close enough for our purposes.<\/p>\r\n\r\n<h2><span title=\"Quick scroll up\">Other Levels of Confidence<\/span><\/h2>\r\n<p id=\"N10B23\">The most commonly used level of confidence is 95%. However, we may wish to increase our level of confidence and produce an interval that is almost certain to contain \u03bc. Specifically, we may want to report an interval for which we are 99% confident\u2014rather than only 95% confident\u2014that it contains the unknown population mean.<\/p>\r\n<p id=\"N10B26\">Using the same reasoning as in the last comment, in order to create a 99% confidence interval for \u03bc, we should ask: There is a probability of 0.99 that any normal random variable takes values within how many standard deviations of its mean? The precise answer is 2.576, and therefore, a 99% confidence interval for \u03bc is [latex]\\bar{x}\\pm2.576*\\frac{\\sigma}{\\sqrt{n}}[\/latex].<\/p>\r\n<p id=\"N10B55\">Another commonly used level of confidence is a 90% level of confidence. Since there is a probability of 0.90 that any normal random variable takes values within 1.645 standard deviations of its mean, the 90% confidence interval for \u03bc is [latex]\\bar{x}\\pm1.645*\\frac{\\sigma}{\\sqrt{n}}[\/latex].<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10B86\">Let\u2019s go back to our first example, the IQ example:<\/p>\r\n<p id=\"N10B89\">The IQ level of students at a particular university has an unknown mean, \u03bc, and a known standard deviation,\u00a0\u03c3 = 15. A simple random sample of 100 students is found to have a sample mean IQ,\u00a0[latex]x = 115[\/latex]. Estimate \u03bc with 90%, 95%, and 99% confidence intervals.<\/p>\r\n<p id=\"N10BB3\">A 90% confidence interval for \u03bc is\u00a0[latex] \\bar{x}\\pm 1.645\\frac{\\sigma }{\\sqrt{n}}=115\\pm 1.645\\left ( \\frac{15}{\\sqrt{100}} \\right )=115\\pm 2.5=(112.5, 117.5)[\/latex]<\/p>\r\n<p id=\"N10C37\">A 95% confidence interval for \u03bc is\u00a0[latex]\\bar{x}\\pm 2\\frac{\\sigma }{\\sqrt{n}}=115\\pm 2\\left ( \\frac{15}{\\sqrt{100}} \\right )=115\\pm 3.0=(112, 118)[\/latex].<\/p>\r\n<p id=\"N10CA3\">A 99% confidence interval for \u03bc is\u00a0[latex]\\bar{x}\\pm2.576*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2.576\\left(\\frac{15}{\\sqrt{100}}\\right)=115\\pm 4.0=\\left(111,119\\right)[\/latex].<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"215\"]\r\n\r\n<\/div>\r\n<\/div>\r\nNote from the previous example and the previous \"Did I Get This?\" activity, that the more confidence I require, the wider the confidence interval for \u03bc (pronounced and sometimes noted as \"mu\"). The 99% confidence interval is wider than the 95% confidence interval, which is wider than the 90% confidence interval.\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A number line illustrating confidence intervals for \u03bc. x-bar is marked at 115. The interval 112.5 and 117.5 is the 90% confidence interval. Enclosing this interval is the interval 112 and 118, which is the 95% confidence interval. Even larger is the 99% confidence interval, ranging from 111 to 119.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image058.gif\" alt=\"A number line illustrating confidence intervals for \u03bc. x-bar is marked at 115. The interval 112.5 and 117.5 is the 90% confidence interval. Enclosing this interval is the interval 112 and 118, which is the 95% confidence interval. Even larger is the 99% confidence interval, ranging from 111 to 119.\" \/><\/span><\/span>\r\n<p id=\"N10D49\">This is not very surprising, given that in the 99% interval we multiply the standard deviation by 2.576, in the 95% by 2, and in the 90% only by 1.645. Beyond this numerical explanation, there is a very clear intuitive explanation and an important implication of this result.<\/p>\r\n<p id=\"N10D4C\">Let\u2019s start with the intuitive explanation. The more certain I want to be that the interval contains the value of \u03bc, the more plausible values the interval needs to include in order to account for that extra certainty. I am 95% certain that the value of \u03bc is one of the values in the interval (112,118). In order to be 99% certain that one of the values in the interval is the value of \u03bc, I need to include more values, and thus provide a wider confidence interval.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10D53\">In our example, the\u00a0<em>wider<\/em>\u00a099% confidence interval (111, 119) gives us a\u00a0<em>less precise<\/em>\u00a0estimation about the value of \u03bc than the narrower 90% confidence interval (112.5, 117.5), because the smaller interval \u201cnarrows in\u201d on the plausible values of \u03bc.<\/p>\r\n<p id=\"N10D5C\">The important practical implication here is that researchers must decide whether they prefer to state their results with a higher level of confidence or produce a more precise interval. In other words,<\/p>\r\n<p id=\"N10D5F\"><em>There is a trade-off between the level of confidence and the precision with which the parameter is estimated<\/em>.<\/p>\r\n<p id=\"N10D65\">The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval). If we would like to estimate \u03bc with more precision (i.e., a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10D72\">In a recent study, 1,115 males 25 to 35 years of age were randomly chosen and asked about their exercise habits. Based on the study results, the researchers estimated the mean time that a male 25 to 35 years of age spends exercising with 90%, 95%, and 99% confidence intervals. These were (not necessarily in the same order):<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"Three number lines illustrating three confidence intervals. The first is shows an interval of (3,4). The second, an interval of (2.5, 4.5), and the third, (2,5).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/_u5_m1_dig3a.gif\" alt=\"Three number lines illustrating three confidence intervals. The first is shows an interval of (3,4). The second, an interval of (2.5, 4.5), and the third, (2,5).\" \/><\/div>\r\n<div>[h5p id=\"216\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n\r\n<hr \/>\r\n<p id=\"N10B04\">So far, we\u2019ve developed the confidence interval for the population mean from scratch, based on results from probability, and discussed the trade-off between the level of confidence and the precision of the interval. The price you pay for a higher level of confidence is a lower level of precision of the interval (i.e., a wider interval).<\/p>\r\n<p id=\"N10B07\">Is there a way to bypass this trade-off? In other words, is there a way to increase the precision of the interval (i.e., make it narrower)\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence? We will answer this question shortly, but first we need to get a deeper understanding of the different components of the confidence interval and its structure.<\/p>\r\n\r\n<div id=\"N10B0E\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Understanding the general structure of the confidence intervals<\/span><\/h2>\r\n<p id=\"N10B15\">We explored the confidence interval for \u03bc for different levels of confidence and found that, in general, it has the following form:<\/p>\r\n[latex]\\bar{x}\\pm z^{*}\\cdot \\frac{\\sigma }{\\sqrt{n}}[\/latex],\r\n<p id=\"N10B47\">where z* is a general notation for the multiplier that depends on the level of confidence. As we discussed before:<\/p>\r\n<p id=\"N10B4A\">For a 90% level of confidence, z* = 1.645<\/p>\r\n<p id=\"N10B4D\">For a 95% level of confidence, z* = 2 (or 1.96 if you want to be really precise)<\/p>\r\n<p id=\"N10B50\">For a 99% level of confidence, z* = 2.576<\/p>\r\n<p id=\"N10B53\">To start our discussion about the structure of the confidence interval, let\u2019s denote the [latex]z^{*}\\cdot\\frac{\\sigma }{\\sqrt{n}}[\/latex] formula by m.<\/p>\r\n<p id=\"N10B76\">The confidence interval, then, has the form:\u00a0[latex]\\bar{x}\\pm m[\/latex]:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A formula: x-bar \u00b1 z-star \u00d7 \u03c3\/\u221an Note that z-star \u00d7 \u03c3\/\u221an is m.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image062.gif\" alt=\"A formula: x-bar \u00b1 z-star \u00d7 \u03c3\/\u221an Note that z-star \u00d7 \u03c3\/\u221an is m.\" \/><\/span><\/span>\r\n<p id=\"N10B95\">[latex]\\overline{x}[\/latex]\u00a0is the sample mean, the point estimator for the unknown population mean (\u03bc).<\/p>\r\n<em>m<\/em>\u00a0is called the\u00a0<em class=\"italic\">margin of error<\/em>, since it represents the maximum estimation error for a given level of confidence.\r\n<p id=\"N10BB2\">For example, for a 95% confidence interval, we are 95% sure that our estimate will not depart from the true population mean by more than m, the margin of error.<\/p>\r\nm is further made up of the product of two components:\r\n<p id=\"N10BB8\">z*, the confidence multiplier, and<\/p>\r\n[latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex], which is the standard deviation of [latex]\\overline{x}[\/latex], the point estimator of \u03bc.\r\n<p id=\"N10BE2\">Here is a summary of the different components of the confidence interval and its structure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"x-bar is the point estimator. It is either added to or subtracted by the margin of error (m). The margin of error is composed of the confidence multiplier, z-star, which is multiplied by the standard deviation of the point estimator, which is \u03c3\/\u221an .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image063.gif\" alt=\"x-bar is the point estimator. It is either added to or subtracted by the margin of error (m). The margin of error is composed of the confidence multiplier, z-star, which is multiplied by the standard deviation of the point estimator, which is \u03c3\/\u221an .\" \/><\/span><\/span>\r\n<p id=\"N10BEB\">This structure:<\/p>\r\n<p id=\"N10BEE\"><em><strong>estimate [latex]\\pm[\/latex] of error<\/strong><\/em><\/p>\r\n<p id=\"N10C11\">where the margin of error is further composed of the product of a confidence multiplier and the standard deviation (or, as we\u2019ll see, the standard error) is the general structure of all confidence intervals that we will encounter in this course.<\/p>\r\nObviously, even though each confidence interval has the same components, what these components actually are is different from confidence interval to confidence interval, depending on what unknown parameter the confidence interval aims to estimate.\r\n<p id=\"N10C17\">Since the structure of the confidence interval is such that it has a margin of error on either side of the estimate, it is centered at the estimate (in our case,\u00a0[latex]\\overline{x}[\/latex]), and its width (or length) is exactly twice the margin of error:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"A number line, on which the estimate has been placed. To the left and to the right are two intervals with the size m. So, the confidence interval, which comprises of both margins of errors (the left one and right one) is of width 2m.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image065.gif\" alt=\"A number line, on which the estimate has been placed. To the left and to the right are two intervals with the size m. So, the confidence interval, which comprises of both margins of errors (the left one and right one) is of width 2m.\" \/><\/span><\/span>\r\n<p id=\"N10C30\">The margin of error, m, is therefore \u201cin charge\u201d of the width (or precision) of the confidence interval, and the estimate is in charge of its location (and has no effect on the width).<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"217\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10B0C\">Let us now go back to the confidence interval for the mean, and more specifically, to the question that we posed at the beginning of the previous page:<\/p>\r\n<p id=\"N10B0F\">Is there a way to increase the precision of the confidence interval (i.e., make it narrower)\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence?<\/p>\r\n<p id=\"N10B16\">Since the width of the confidence interval is a function of its margin of error, let\u2019s look closely at the margin of error of the confidence interval for the mean and see how it can be reduced:<\/p>\r\n[latex]z^{*}\\cdot\\frac{\\sigma}{\\sqrt{n}}[\/latex]\r\n<p id=\"N10B3C\">Since z* controls the level of confidence, we can rephrase our question above in the following way:<\/p>\r\n<p id=\"N10B3F\">Is there a way to reduce this margin of error other than by reducing z*?<\/p>\r\n<p id=\"N10B42\">If you look closely at the margin of error, you\u2019ll see that the answer is yes. We can do that by increasing the sample size n (since it appears in the denominator).<\/p>\r\n<p id=\"N10B49\">Let\u2019s look at an example first and then explain why increasing the sample size is a way to increase the precision of the confidence interval\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10B52\">Recall the IQ example:<\/p>\r\nThe IQ level of students at a particular university has an unknown mean (\u03bc) and a known standard deviation of \u03c3 = 15. A simple random sample of 100 students is found to have the sample mean IQ\u00a0 [latex]\\overline{x}=115[\/latex]. A 95% confidence interval for \u03bc in this case is:\r\n[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2\\left(\\frac{15}{\\sqrt{100}}\\right)=115\\pm3.0=\\left(112,118\\right)[\/latex]\r\nNote that the margin of error is m = 3, and therefore the width of the confidence interval is 6.\r\n\r\nNow, what if we change the problem slightly by increasing the sample size, and assume that it was 400 instead of 100?\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=400, represented by a smaller circle. In this sample, we find that x bar = 115.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image067.gif\" alt=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=400, represented by a smaller circle. In this sample, we find that x bar = 115.\" \/><\/span><\/span>\r\n\r\nIn this case, the 95% confidence interval for \u03bc is:\r\n[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2\\left(\\frac{15}{\\sqrt{400}}\\right)=115\\pm1.5=\\left(113.5,116.5\\right)[\/latex]\r\n<p id=\"N10C72\">The margin of error here is only m = 1.5, and thus the width is only 3.<\/p>\r\n<p id=\"N10C75\">Note that for the same level of confidence (95%) we now have a narrower, and thus more precise, confidence interval.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10C79\">Let\u2019s try to understand why a larger sample size will reduce the margin of error for a fixed level of confidence. There are three ways to explain it: mathematically, using probability theory, and intuitively.<\/p>\r\n<p id=\"N10C7E\">We\u2019ve already alluded to the mathematical explanation; the margin of error is [latex]z^{*}\\cdot\\frac{\\sigma}{\\sqrt{n}}[\/latex], and since n, the sample size, appears in the denominator, increasing n will reduce the margin of error.<\/p>\r\n<p id=\"N10CA1\">As we saw in our discussion about point estimates, probability theory tells us that<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"Two sampling distribution curves for x-bar. One is squished down and wider, while the other is much taller and narrower. Both curves share the same \u03bc. The tall, narrow distribution was based on a larger sample size, which has a smaller standard deviation, and so is less spread out. This means that values of x-bar are more likely to be closer to \u03bc when the sample size is larger.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image069.gif\" alt=\"Two sampling distribution curves for x-bar. One is squished down and wider, while the other is much taller and narrower. Both curves share the same \u03bc. The tall, narrow distribution was based on a larger sample size, which has a smaller standard deviation, and so is less spread out. This means that values of x-bar are more likely to be closer to \u03bc when the sample size is larger.\" \/><\/span><\/span>\r\n<p id=\"N10CAA\">This explains why with a larger sample size the margin of error (which represents how far apart we believe\u00a0[latex]\\overline{x}[\/latex]\u00a0might be from \u03bc for a given level of confidence) is smaller.<\/p>\r\n<p id=\"N10CBD\">On an intuitive level, if our estimate\u00a0[latex]\\overline{x}[\/latex]\u00a0is based on a larger sample (i.e., a larger fraction of the population), we have more faith in it, or it is more reliable, and therefore we need to account for less error around it.<\/p>\r\n\r\n<div id=\"N10CD0\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"N10CD7\">While it is true that for a given level of confidence, increasing the sample size increases the precision of our interval estimation, in practice, increasing the sample size is not always possible. Consider a study in which there is a non-negligible cost involved for collecting data from each participant (an expensive medical procedure, for example). If the study has some budgetary constraints, which is usually the case, increasing the sample size from 100 to 400 is just not possible in terms of cost-effectiveness. Another instance in which increasing the sample size is impossible is when a larger sample is simply not available, even if we had the money to afford it. For example, consider a study on the effectiveness of a drug on curing a very rare disease among children. Since the disease is rare, there are a limited number of children who could be participants. This is the reality of statistics. Sometimes theory collides with reality, and you just do the best you can.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"218\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B1C\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Sample Size Calculations<\/span><\/h2>\r\nAs we just learned, for a given level of confidence, the sample size determines the size of the margin of error and thus the width, or precision, of our interval estimation. This process can be reversed.\r\n\r\nIn situations where a researcher has some flexibility as to the sample size, the researcher can calculate in advance what the sample size is that he\/she needs in order to be able to report a confidence interval with a certain level of confidence and a certain margin of error. Let\u2019s look at an example.\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\"><span class=\"scnReader\">Example<\/span><\/div>\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B2B\">Recall the example about the SAT-M scores of community college students.<\/p>\r\n<p id=\"N10B2E\">An educational researcher is interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution, with the standard deviation\u00a0\u03c3 = 100.<\/p>\r\nThe 95% confidence interval for \u03bc is [latex]\\left(475-2*\\frac{100}{\\sqrt{650}},475+2\\frac{100}{\\sqrt{650}}\\right)[\/latex], which is roughly [latex]475\\pm8[\/latex] <span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00b1<\/span><\/span><span class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">8<\/span><\/span><\/span><\/span><\/span>, or (467,484). For a sample size of n = 650, our margin of error is 8.\r\n<p id=\"N10B9A\">Now, let\u2019s think about this problem in a slightly different way:<\/p>\r\n<p id=\"N10B9D\">An educational researcher is interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state with a margin of error of (only) 5, at the 95% confidence level. What is the sample size needed to achieve this? (\u03c3, of course, is still assumed to be 100).<\/p>\r\n<p id=\"N10BA0\">To solve this, we set:<\/p>\r\n[latex]m=2\\bullet\\frac{100}{\\sqrt{n}}=5[\/latex]\r\n\r\nso\r\n\r\n[latex]\\sqrt{n}=\\frac{2\\left(100\\right)}{5}[\/latex]\r\n<p id=\"N10BF5\">and<\/p>\r\n[latex]n=\\left(\\frac{2\\left(100\\right)}{5}\\right)^2=1600[\/latex]\r\n<p id=\"N10C2D\">So, for a sample size of 1,600 community college students, the researcher will be able to estimate \u03bc with a margin of error of 5, at the 95% level. In this example, we can also imagine that the researcher has some flexibility in choosing the sample size, since there is a minimal cost (if any) involved in recording students\u2019 SAT-M scores, and there are many more than 1,600 community college students in each state.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10C31\">Rather than take the same steps to isolate n every time we solve such a problem, we may obtain a general expression for the required n for a desired margin of error m and a certain level of confidence.<\/p>\r\n<p id=\"N10C34\">Since\u00a0[latex]m=z*\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0is the formula to determine m for a given n, we can use simple algebra to express n in terms of m (multiply both sides by the square root of n, divide both sides by m, and square both sides) to get<\/p>\r\n[latex]\\mathcal{n}=\\left(\\frac{\\mathcal{z}*\\sigma}{\\mathcal{m}}\\right)^2[\/latex]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10C8D\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"N10C94\">Clearly, the sample size n must be an integer. In the previous example we got n = 1,600, but in other situations, the calculation may give us a non-integer result. In these cases, we should always\u00a0<em>round up to the next highest integer.<\/em><\/p>\r\n<p id=\"N10C9A\">Using this \u201cconservative approach,\u201d we\u2019ll achieve an interval at least as narrow as the one desired.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10C9F\">IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99% confidence with a margin of error equal to 2?<\/p>\r\n[latex]\\mathcal{n}=\\left(\\frac{\\mathcal{z}*\\sigma}{\\mathcal{m}}\\right)^2=\\left(\\frac{2.576\\left(15\\right)}{2}\\right)^2=373.26[\/latex]\r\n<p id=\"N10D13\">Round up to be safe, and take a sample of 374 students.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"219\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10D3B\" class=\"section\">\r\n<div class=\"sectionContain \">\r\n<h2>Comment<\/h2>\r\n<p id=\"N10D42\">In the preceding activity, you saw that in order to calculate the sample size when planning a study, you needed to know the population standard deviation, sigma (\u03c3). In practice, sigma is usually not known, because it is a parameter. (The rare exceptions are certain variables like IQ score or standardized tests that might be constructed to have a particular known sigma.)<\/p>\r\n<p id=\"N10D44\">Therefore, when researchers wish to compute the required sample size in preparation for a study, they use an\u00a0<em>estimate<\/em>\u00a0of sigma. Usually, sigma is estimated based on the standard deviation obtained in prior studies.<\/p>\r\n<p id=\"N10D49\">However, in some cases, there might not be any prior studies on the topic. In such instances, a researcher still needs to get a rough estimate of the standard deviation of the (yet-to-be-measured) variable, in order to determine the required sample size for the study. One way to get such a rough estimate is with the \"range rule of thumb,\" which you will practice in the following activity.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10D50\">The purpose of the next activity is to give you some experience with a method for roughly estimating sigma (\u03c3, the population standard deviation) when no prior studies are available, in order to compute sample size when planning a first study.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"220\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"e926c636a7d64fab852a5ea6fc0aff31\">We are almost done with this section. We need to discuss just a few more questions:<\/p>\r\n\r\n<ul id=\"a4aa87d73dec4341b6c29d5ac0ddbf3c\">\r\n \t<li>\r\n<p id=\"af1141518e8dd4d1aa31b63d2ff0c6e5b\">Is it always okay to use the confidence interval we developed for \u03bc when \u03c3 is known?<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"adda3bccbe99242e7a0133ca2ec830923\">What if \u03c3 is unknown?<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af820d6216c0b41c19fac8607140577a0\">How can we use statistical software to calculate confidence intervals for us?<\/p>\r\n<\/li>\r\n<\/ul>\r\n<div id=\"d8d1339cce934ab290950d7904aafaed\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">When Is It Safe to Use the Confidence Interval We Developed?<\/span><\/h2>\r\n<p id=\"fc23cfe2184a419daac2566fd2ca33a8\">One of the most important things to learn with any inference method is the conditions under which it is safe to use it. It is very tempting to apply a certain method, but if the conditions under which this method was developed are not met, then using this method will lead to unreliable results, which can then lead to wrong and\/or misleading conclusions. As you\u2019ll see throughout this section, we always discuss the conditions under which each method can be safely used.<\/p>\r\n<p id=\"d7086d37e1d94ef8ab60eb11c491543e\">In particular, the confidence interval for \u03bc (when \u03c3 is known),\u00a0[latex] \\bar{x}\\pm z^{*}*\\frac{\\sigma }{\\sqrt{n}}[\/latex], was developed assuming that the sampling distribution of\u00a0[latex]\\overline{X}[\/latex]\u00a0is normal; in other words, that the Central Limit Theorem applies. In particular, this allowed us to determine the values of z*, the confidence multiplier, for different levels of confidence.<\/p>\r\n<p id=\"cb460886354b47ab85b316cf50c88388\">First,\u00a0<em class=\"italic\">the sample must be random.<\/em>\u00a0Assuming that the sample is random, recall from the Probability unit that the Central Limit Theorem works when the\u00a0<em class=\"italic\">sample size is large<\/em>\u00a0(a common rule of thumb for \u201clarge\u201d is n &gt; 30), or, for\u00a0<em class=\"italic\">smaller sample sizes<\/em>, if it is known that the quantitative\u00a0<em class=\"italic\">variable<\/em>\u00a0of interest is\u00a0<em class=\"italic\">distributed normally<\/em>\u00a0in the population. The only situation in which we cannot use the confidence interval, then, is when the sample size is small and the variable of interest is not known to have a normal distribution. In that case, other methods, called nonparametric methods, which are beyond the scope of this course, need to be used. This can be summarized in the following table:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a5c5bd93cef74b7786c187d516525057\" class=\"img-responsive popimg aligncenter\" title=\"A table with two columns and two rows. The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size.&amp;quot; The row headings are &amp;quot;Variable varies normally&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally, Small sample size: OK; Variable varies normally, Large sample size: OK; Variable doesn&amp;apos;t vary normally, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image077.gif\" alt=\"A table with two columns and two rows. The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size.&amp;quot; The row headings are &amp;quot;Variable varies normally&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally, Small sample size: OK; Variable varies normally, Large sample size: OK; Variable doesn&amp;apos;t vary normally, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally, Large sample size: OK;\" \/><\/span><\/span>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10075\">Below are four different situations in which a confidence interval for \u03bc is called for.<\/p>\r\n<p id=\"N10077\"><em>Situation A:<\/em>\u00a0In order to estimate \u03bc, the mean annual salary of high-school teachers in a certain state, a random sample of 150 teachers was chosen and their average salary was found to be $38,450. From past experience, it is known that teachers' salaries have a standard deviation of $5,000.<\/p>\r\n<p id=\"N1007B\"><em>Situation B:<\/em>\u00a0A medical researcher wanted to estimate \u03bc, the mean recovery time from open-heart surgery for males between the ages of 50 and 60. The researcher followed the next 15 male patients in this age group who underwent open-heart surgery in his medical institute through their recovery period. (Comment: Even though the sample was not strictly random, there is no reason to believe that the sample of \"the next 15 patients\" introduces any bias, so it is as good as a random sample). The mean recovery time of the 15 patients was 26 days. From the large body of research that was done in this area, it is assumed that recovery times from open-heart surgery have a standard deviation of 3 days.<\/p>\r\n<p id=\"N1007F\"><em>Situation C:<\/em>\u00a0In order to estimate \u03bc, the mean score on the quantitative reasoning part of the GRE (Graduate Record Examination) of all MBA students, a random sample of 1,200 MBA students was chosen, and their scores were recorded. The sample mean was found to be 590. It is known that the quantitative reasoning scores on the GRE vary normally with a standard deviation of 150.<\/p>\r\n<p id=\"N10083\"><em>Situation D:<\/em>\u00a0A psychologist wanted to estimate \u03bc, the mean time it takes 6-year-old children diagnosed with Down's Syndrome to complete a certain cognitive task. A random sample of 12 children was chosen and their times were recorded. The average time it took the 12 children to complete the task was 7.5 minutes. From past experience with similar tasks, the time is known to vary normally with a standard deviation of 1.3 minutes.<\/p>\r\n[h5p id=\"221\"]\r\n<p id=\"a7f4998a72bd467d92bcd656a24f923c\">Below are four different situations in which a confidence interval formula would be useful:<\/p>\r\n<p id=\"f01945b34ce545b2b959d7a44f891b02\"><em class=\"italic\">Situation A:<\/em>\u00a0A marketing executive wants to estimate the average time, in days, that a watch battery will last. She tests 50 randomly selected batteries and finds that the distribution is skewed to the left, since a couple of the batteries were defective. It is known from past experience that the standard deviation is 25 days.<\/p>\r\n<p id=\"cb9235a3672a4769a0a6600f3f561f6a\"><em class=\"italic\">Situation B:<\/em>\u00a0A college professor desires an estimate of the mean number of hours per week that full-time college students are employed. He randomly selected 250 college students and found that they worked a mean time of 18.6 hours per week. He uses previously known data for his standard deviation.<\/p>\r\n<p id=\"cfaa1036e2ae4e59bbd0ca3ce1fe9ed6\"><em class=\"italic\">Situation C:<\/em>\u00a0A medical researcher at a sports medicine clinic uses 35 volunteers from the clinic to study the average number of hours the typical American exercises per week. It is known that hours of exercise are normally distributed and past data give him a standard deviation of 1.2 hours.<\/p>\r\n<p id=\"d4f614ad35a14c0393ea2371914072f6\"><em class=\"italic\">Situation D:<\/em>\u00a0A high-end auto manufacturer tests 5 randomly selected cars to find out the damage caused by a 5 mph crash. It is known that this distribution is normal. Assume that the standard deviation is known.<\/p>\r\n\r\n<div class=\"asx \">\r\n<div id=\"du4_m1_confintmean7_digt_tutor1\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"activityhead\">\r\n<div class=\"activityinfo\"><\/div>\r\n<\/div>\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m1_confintmean7_digt_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m1_confintmean7_digt_tutor1\" class=\"page 2963021\">\r\n<div id=\"2963021\" class=\"question\">[h5p id=\"222\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div>\r\n<div class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<p id=\"e1b9734a819a407cb80b4d44ac889fe5\"><em class=\"italic\">What if\u00a0<\/em>\u03c3\u00a0<em class=\"italic\">is unknown?<\/em><\/p>\r\n<p id=\"b40d6877b5894422aff7930d7c9009ac\">As we discussed earlier, when variables have been well-researched in different populations it is reasonable to assume that the population standard deviation (\u03c3) is known. However, this is rarely the case. What if \u03c3 is unknown?<\/p>\r\n<p id=\"babe4722279c4920ad165c821861bb4d\">Well, there is some good news and some bad news.<\/p>\r\n<p id=\"ef8d299b51b04e039ea1c6ef56d5bb22\">The good news is that we can easily replace the population standard deviation, \u03c3, with the\u00a0<em class=\"italic\">sample<\/em>\u00a0standard deviation, s.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b9c36bf450c14920a7ba79c4f46caaca\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image078.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" \/><\/span><\/span>\r\n<p id=\"ad28be4f62ac44ed9bbd2f403825ce3b\">The bad news is that once \u03c3 has been replaced by s, we lose the Central Limit Theorem, together with the normality of\u00a0[latex]\\overline{X}[\/latex], and therefore the confidence multipliers z* for the different levels of confidence (1.645, 2, 2.576) are (generally) not accurate any more. The new multipliers come from a different distribution called the \u201ct distribution\u201d and are therefore denoted by t* (instead of z*). We will discuss the t distribution in more detail when we talk about hypothesis testing.<\/p>\r\n<p id=\"f233d0f9df3b4c859e7901b9ebb2522d\">The confidence interval for the population mean (\u03bc) when (\u03c3) is unknown is therefore:<\/p>\r\n[latex]\\bar{x}\\pm t^**\\frac{s}{\\sqrt{n}}[\/latex]\r\n<p id=\"c8a0354558694dbc8e8c642f1cb53495\">(Note that this interval is very similar to the one when \u03c3 is known, with the obvious changes: s replaces \u03c3, and t* replaces z* as discussed above.)<\/p>\r\n<p id=\"c59fa88aeb2b425c9213365feffb0efa\">There is an important difference between the confidence multipliers we have used so far (z*) and those needed for the case when \u03c3 is unknown (t*). Unlike the confidence multipliers we have used so far (z*), which depend only on the level of confidence, the new multipliers (t*) have the\u00a0<em class=\"italic\">added complexity<\/em>\u00a0that they\u00a0<em class=\"italic\">depend on both the level of confidence and on the sample size<\/em>\u00a0(for example, the t* used in a 95% confidence when n = 10 is different from the t* used when n = 40). Due to this added complexity in determining the appropriate t*, we will rely heavily on software in this case.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"aa8aab1bd3934f93b4266e13c4a063ea\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\r\n<ol>\r\n \t<li id=\"f820c0b50aac4393a0a083de956bdfe0\">Since it is quite rare that \u03c3 is known, this interval (sometimes called a <em class=\"italic\">one-sample t confidence interval<\/em>) is more commonly used as the confidence interval for estimating \u03bc. (Nevertheless, we could not have presented it without our extended discussion up to this point, which also provided you with a solid understanding of confidence intervals.)<\/li>\r\n \t<li id=\"af53d45f69d94a018c1959cf48e1d25c\">The quantity [latex]\\frac{s}{\\sqrt{n}}[\/latex] is called the\u00a0<em class=\"italic\">standard error<\/em>\u00a0of\u00a0[latex]\\overline{X}[\/latex]. The central limit theorem tells us that [latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex] is the\u00a0<em class=\"italic\">standard deviation<\/em>\u00a0of\u00a0[latex]\\overline{X}[\/latex] (and this is the quantity used in confidence interval when \u03c3 is known). In general, whenever we replace parameters with their sample counterparts in the standard deviation of a statistic, the resulting quantity is called the standard error of the statistic. In this case, we replaced \u03c3 with its sample counterpart (s), and thus\u00a0[latex]\\frac{s}{\\sqrt{n}}[\/latex]\u00a0is the\u00a0<em class=\"italic\">standard error<\/em>\u00a0of (the statistic)\u00a0[latex]\\overline{X}[\/latex].<\/li>\r\n \t<li id=\"ad7a9bdd565a4dde97fa7fe4dce87f04\">As before, to safely use this confidence interval, the sample <em class=\"italic\">must be random<\/em>, and the only case when this interval cannot be used is when the sample size is small and the variable is not known to vary normally.<\/li>\r\n<\/ol>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"223\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Final comment<\/span><\/h2>\r\n<p id=\"dcbcf0c7629e4233bde8eae1030c35a2\">It turns out that for large values of n, the t* multipliers are not that different from the z* multipliers, and therefore using the interval formula:<\/p>\r\n[latex]\\bar{x}\\pm z^{*}*\\frac{s}{\\sqrt{n}}[\/latex]\r\n<p id=\"c25ac0eb8c9f4bf59941bd2147d71f6f\">for \u03bc when \u03c3 is unknown provides a pretty good approximation.<\/p>\r\n\r\n<h2><span title=\"Quick scroll up\">Let\u2019s summarize<\/span><\/h2>\r\n<p id=\"cc9aef71106649219eaf7b3bc4d55c98\">* When the population is normal and\/or the sample is large, a confidence interval for unknown population mean \u03bc when \u03c3 is known is:<\/p>\r\n[latex]\\bar{x}\\pm z*\\frac{\\sigma }{\\sqrt{n}}[\/latex], where z* is 1.645 for 90% confidence, 2 for 95% confidence, and 2.576 for 99% confidence.\r\n<p id=\"b69b41268d704e14bb370b2a7703b028\">* There is a trade-off between the level of confidence and the precision of the interval estimation. The price we have to pay for more precision is sacrificing level of confidence.<\/p>\r\n<p id=\"a99e5aa935b2475daf1190048ceb301f\">* The general form of confidence intervals is an estimate +\/- the margin of error (m). In this case, the estimate=<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00af<\/span><\/span><\/span><span class=\"mjx-op\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">x<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> and\u00a0[latex]m=z*\\frac{\\sigma }{\\sqrt{n}}[\/latex]. The confidence interval is therefore centered at the estimate and its width is exactly 2m.<\/p>\r\n<p id=\"e048e0b2133e4c3a8809c51146e2d544\">* For a given level of confidence, the width of the interval depends on the sample size. We can therefore do a sample size calculation to figure out what sample size is needed in order to get a confidence interval with a desired margin of error m, and a certain level of confidence (assuming we have some flexibility with the sample size). To do the sample size calculation we use:<\/p>\r\n[latex]n=\\left(\\frac{z^*\\sigma}{m}\\right)^2[\/latex]\r\n<p id=\"f6889a0cf34643efac5a7abc4533b138\">(and round\u00a0<em class=\"italic\">up<\/em>\u00a0to the next integer).<\/p>\r\n<p id=\"c3ca044d4d624a1e984ee1cc715d182f\">* When \u03c3 is unknown, we use the sample standard deviation, s, instead, but as a result we also need to use a different set of confidence multipliers (t*) associated with the t distribution. The interval is therefore<\/p>\r\n[latex]x\\pm t^{*}*\\frac{s}{\\sqrt{n}}[\/latex]\r\n<p id=\"c8e19115a4114af5a8ecaf5f1f0dafe5\">* These new multipliers have the added complexity that they depend not only on the level of confidence, but also on the sample size. Software is therefore very useful for calculating confidence intervals in this case.<\/p>\r\n<p id=\"cc6aac5cc77e495eacf7746ce1751300\">* For large values of n, the t* multipliers are not that different from the z* multipliers, and therefore using the interval formula:<\/p>\r\n[latex]x\\pm z^{*}*\\frac{s}{\\sqrt{n}}[\/latex]\r\n<p id=\"b7897adc0eb14a86a50b80d73c120f21\">for \u03bc when \u03c3 is unknown provides a pretty good approximation.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"lobjh\" class=\"multi\">\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h2 class=\"textbox__title\">Learning Objectives<\/h2>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li id=\"explain_confidence_interval\">Explain what a confidence interval represents and determine how changes in sample size and confidence level affect the precision of the confidence interval.<\/li>\n<li id=\"find_confidence_intervals\">Find confidence intervals for the population mean and the population proportion (when certain conditions are met), and perform sample size calculations.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10AFC\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Overview<\/span><\/h2>\n<p id=\"N10B03\">As we mentioned in the introduction to interval estimation, we start by discussing interval estimation for the population mean \u03bc. Here is a quick overview of how we introduce this topic.<\/p>\n<ul>\n<li>Learn how a 95% confidence interval for the population mean \u03bc is constructed and interpreted.<\/li>\n<li>Generalize to confidence intervals with other levels of confidence (for example, what if we want a 99% confidence interval?).<\/li>\n<li>Understand more broadly the structure of a confidence interval and the importance of the margin of error.<\/li>\n<li>Understand how the precision of interval estimation is affected by the confidence level and sample size.<\/li>\n<li>Learn under which conditions we can safely use the methods that are introduced in this section.<\/li>\n<\/ul>\n<p id=\"N10B1A\">Recall the IQ example:<\/p>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10B21\">Suppose that we are interested in studying the IQ levels of students at Smart University (SU). In particular (since IQ level is a quantitative variable), we are interested in estimating \u03bc, the mean IQ level of all the students at SU.<\/p>\n<p id=\"N10B24\">We will assume that from past research on IQ scores in different universities, it is known that the IQ standard deviation in such populations is \u03c3 = 15. In order to estimate \u03bc , a random sample of 100 SU students was chosen, and their (sample) mean IQ level is calculated (let\u2019s not assume, for now, that the value of this sample mean is 115, as before).<\/p>\n<p id=\"N10B38\"><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=100, represented by a smaller circle. In this sample, we need to find x bar\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image033.gif\" alt=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=100, represented by a smaller circle. In this sample, we need to find x bar\" \/><\/span><\/span><\/p>\n<p>We will now show the rationale behind constructing a 95% confidence interval for the population mean \u03bc.<\/p>\n<p id=\"N10B41\">* We learned in the \u201cSampling Distributions\u201d module of probability that according to the central limit theorem, the sampling distribution of the sample mean [latex]\\overline{X}[\/latex] is approximately normal with a mean of \u03bc and standard deviation of [latex]\\frac{\\sigma}{\\sqrt{n}}[\/latex]. In our example, then, \u03c3 = 15 and n=100), the possible values of[latex]\\overline{X}[\/latex]&gt;, the sample mean IQ level of 100 randomly chosen students, is approximately normal, with mean \u03bc and standard deviation [latex]\\frac{15}{\\sqrt{100}}=1.5[\/latex]<\/p>\n<p id=\"N10BA8\">* Next, we recall and apply the Standard Deviation Rule for the normal distribution, and in particular its second part:<\/p>\n<p id=\"N10BAB\">There is a 95% chance that the sample mean we get in our sample falls within 2 * 1.5 = 3 of \u03bc.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A distribution curve with a horizontal axis labeled &quot;X bar.&quot; The curve is centered at X-bar=\u03bc, and marked on the axis is \u03bc+3 and \u03bc-3. There is a 95% chance that any x-bar will fall between \u03bc-3 and \u03bc+3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/2_iq.png\" alt=\"A distribution curve with a horizontal axis labeled &quot;X bar.&quot; The curve is centered at X-bar=\u03bc, and marked on the axis is \u03bc+3 and \u03bc-3. There is a 95% chance that any x-bar will fall between \u03bc-3 and \u03bc+3.\" height=\"350\" \/><\/span><\/span><\/p>\n<p id=\"N10BB5\">* Obviously, if there is a certain distance between the sample mean and the population mean, we can describe that distance by starting at either value. So, if the sample mean (<em><strong>x<\/strong><\/em>) falls within a certain distance of the population mean \u03bc, then the population mean \u03bc falls within the same distance of the sample mean.<\/p>\n<p id=\"N10BC8\">Therefore, the statement, \u201cThere is a 95%\u00a0<em>chance<\/em>\u00a0that the\u00a0<em>sample<\/em>\u00a0mean\u00a0[latex]\\overline{x}[\/latex]\u00a0falls within 3 units of \u03bc\u201d can be rephrased as: \u201cWe are 95%\u00a0<em>confident<\/em>\u00a0that the\u00a0<em>population<\/em>\u00a0mean \u03bc falls within 3 units of\u00a0[latex]\\overline{x}[\/latex].\u201d<\/p>\n<p id=\"N10BF7\">So, if we happen to get a sample mean of\u00a0[latex]\\overline{x}=115[\/latex], then we are 95% sure that \u03bc falls within 3 of 115, or in other words that \u03bc is covered by the interval (115 \u2013 3, 115 + 3) = (112,118).<\/p>\n<p id=\"N10C10\">(On later pages, we will use similar reasoning to develop a general formula for a confidence interval.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10C14\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"N10C1B\">Note that the first phrasing is about\u00a0[latex]\\overline{x}[\/latex], which is a random variable; that\u2019s why it makes sense to use probability language. But the second phrasing is about \u03bc, which is a parameter, and thus is a \u201cfixed\u201d value that doesn\u2019t change, and that\u2019s why we shouldn\u2019t use probability language to discuss it. This point will become clearer after you do the activities on the next page.<\/p>\n<div id=\"f12d872c2d2c4e50b7ba1628f01b94f2\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The General Case<\/span><\/h2>\n<\/div>\n<\/div>\n<p id=\"c4df7cfa0bd94e5a8b37bc87a6939387\">Let\u2019s generalize the IQ example. Suppose that we are interested in estimating the unknown population mean (\u03bc) based on a random sample of size n. Further, we assume that the population standard deviation (\u03c3) is known.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d78e18e3028c41d9ac4b9b29364108a2\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image036.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS.\" \/><\/span><\/span><\/p>\n<p id=\"e0ed4af979a5434bb7fdcd3658ce130b\">The values of\u00a0[latex]\\overline{x}[\/latex]\u00a0follow a normal distribution with (unknown) mean \u03bc and standard deviation\u00a0[latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0(known, since both \u03c3 and n are known). By the (second part of the) Standard Deviation Rule, this means that:<\/p>\n<p id=\"a023622336d94ea1b45c32cb66a0f93a\">There is a 95% chance that our sample mean ([latex]\\overline{x}[\/latex]) will fall within\u00a0[latex]2*\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0of \u03bc,<\/p>\n<p id=\"db9a948eef784491aea4dd9f31d6b96c\">which means that:<\/p>\n<p id=\"f923f5db00184ea2b08a99a3cc19d0a0\">We are 95% confident that \u03bc falls within [latex]2*\\frac{\\sigma}{\\sqrt{n}}[\/latex] of our sample mean ([latex]\\overline{x}[\/latex]).<\/p>\n<p id=\"bc2ed2a5f4214858965eb025fbe3e203\">Or, in other words, a 95% confidence interval for the population mean \u03bc is:<\/p>\n<p>[latex]\\left ( \\bar{x}-2*\\frac{\\sigma }{\\sqrt{n}},\\bar{x}+2*\\frac{\\sigma }{\\sqrt{n}} \\right )[\/latex]<\/p>\n<p id=\"adaf3869fce4426ab8081751e3a299e6\">Here, then, is the\u00a0<em class=\"italic\">general result:<\/em><\/p>\n<p id=\"be60f7aa5f6e49e986dad5bf77868cbd\">Suppose a random sample of size n is taken from a normal population of values for a quantitative variable whose mean (\u03bc) is unknown, when the standard deviation (\u03c3) is given. A 95% confidence interval (CI) for \u03bc is:<\/p>\n<p>[latex]\\bar{{x}}\\pm2*\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/p>\n<div id=\"cbd10bf98c704267a9c424db1386f3ca\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"afdb9e3cc56944e2b3f514195af2ec1e\">Note that for now we require the population standard deviation (\u03c3) to be known. Practically, \u03c3 is rarely known, but for some cases, especially when a lot of research has been done on the quantitative variable whose mean we are estimating (such as IQ, height, weight, scores on standardized tests), it is reasonable to assume that \u03c3 is known. Eventually, we will see how to proceed when \u03c3 is unknown, and must be estimated with sample standard deviation (s).<\/p>\n<p id=\"b1fe25389bf34e8bbbf719782fe336d0\">Let\u2019s look at another example.<\/p>\n<div id=\"f4d4b73fafe44f9586233109884ce00b\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"f1496d1bcc23488ba91939cd3c5b099f\">An educational researcher was interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution with the standard deviation\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-92\" class=\"mjx-math\"><span id=\"MJXc-Node-93\" class=\"mjx-mrow\"><span id=\"MJXc-Node-94\" class=\"mjx-mrow\"><span id=\"MJXc-Node-95\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c3<\/span><\/span><span id=\"MJXc-Node-96\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-97\" class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">100<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0.<\/p>\n<p id=\"f87612f13ef8410baafe9c1791a9990e\">Here is a visual representation of this story, which summarizes the information provided:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c5be519f385c406fac212e13698dc0ef\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of Community college students in the researcher&amp;apos;s state. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n=650, represented by a smaller circle. We can find that x-bar=475 for this SRS.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image045.gif\" alt=\"A large circle represents the population of Community college students in the researcher&amp;apos;s state. \u03bc is unknown, but \u03c3 is known about the population. From the population we create a SRS of size n=650, represented by a smaller circle. We can find that x-bar=475 for this SRS.\" \/><\/span><\/span><\/p>\n<p id=\"ae681d2b1a484e12bc4ac91b55da58a3\">Based on this information, let\u2019s estimate \u03bc with a 95% confidence interval.<\/p>\n<p id=\"ab817cc09c1b4870887714af028f693f\">Using the formula we developed before,\u00a0[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}[\/latex], a 95% confidence interval for \u03bc is:<\/p>\n<p>[latex]\\left(475-2*\\frac{100}{\\sqrt{650}},475+2*\\frac{100}{\\sqrt{650}}\\right)[\/latex], which is (475 \u2013 7.8 , 475 + 7.8) = (467.2, 482.8). In this case, it makes sense to round, since SAT scores can be only whole numbers, and say that the 95% confidence interval is (467, 483).<\/p>\n<p id=\"d6856c208fa0492e9b1448bb11e4c9f5\">We are not done yet. An equally important part is to\u00a0<em class=\"italic\">interpret what this means in the context of the problem.<\/em><\/p>\n<p id=\"c7ebd531c5d04c30a56f11c17b4b9968\">We are 95% confident that the mean SAT-M score of all community college students in the researcher\u2019s state is covered by the interval (467, 483). Note that the confidence interval was obtained by taking\u00a0<span id=\"MathJax-Element-12-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-139\" class=\"mjx-math\"><span id=\"MJXc-Node-140\" class=\"mjx-mrow\"><span id=\"MJXc-Node-141\" class=\"mjx-mrow\"><span id=\"MJXc-Node-142\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><span id=\"MJXc-Node-143\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00b1<\/span><\/span><span id=\"MJXc-Node-144\" class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">8<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0(rounded). This means that we are 95% confident that by using the sample mean (<span id=\"MathJax-Element-13-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-145\" class=\"mjx-math\"><span id=\"MJXc-Node-146\" class=\"mjx-mrow\"><span id=\"MJXc-Node-147\" class=\"mjx-mrow\"><span id=\"MJXc-Node-148\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-150\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00af<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-149\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">x<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-151\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-152\" class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><\/span><\/span><\/span><\/span>) to estimate \u03bc, our error is no more than 8.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-214\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-214\" class=\"h5p-iframe\" data-content-id=\"214\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<p id=\"c89204d503a248e9b66a31b04f3b5b50\">We just saw that one interpretation of a 95% confidence interval is that we are 95% confident that the population mean (\u03bc) is contained in the interval. Another useful interpretation in practice is that, given the data, the confidence interval represents the set of plausible values for the population mean \u03bc.<\/p>\n<div id=\"ca878b0fea5a4dafa894857dc7c0ccc5\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"be62bc4621b04dc1b1eafd9ac6361bb4\">As an illustration, let\u2019s return to the example of mean SAT-Math score of community college students. Recall that we had constructed the confidence interval (467, 483) for the unknown mean SAT-M score for all community college students.<\/p>\n<p id=\"d6b2be48039c40c4a16c214bae93bb85\">Here is a way that we can use the confidence interval:<\/p>\n<p id=\"c4adfa7de2564fa189996bb80c998e5c\">Do the results of this study provide evidence that \u03bc, the mean SAT-M score of community college students, is lower than the mean SAT-M score in the general population of college students in that state (which is 480)?<\/p>\n<p id=\"ca80c6a8b56c41088b398b22ddad7202\">The 95% confidence interval for \u03bc was found to be (467, 483). Note that 480, the mean SAT-M score in the general population of college students in that state, falls inside the interval, which means that it is one of the plausible values for \u03bc.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b94ae08c9ba845fb97233f2771718e6d\" class=\"img-responsive popimg aligncenter\" title=\"A number line, on which the 95% confidence interval for \u03bc has been marked, from 467 to 483. At 480 is the mean SAT-M score in the general population of college students in the state.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image050.gif\" alt=\"A number line, on which the 95% confidence interval for \u03bc has been marked, from 467 to 483. At 480 is the mean SAT-M score in the general population of college students in the state.\" \/><\/span><\/span><\/p>\n<p id=\"d39b6a2551694d519b42d10c72608c6e\">This means that \u03bc could be 480 (or even higher, up to 483), and therefore we cannot conclude that the mean SAT-M score among community college students in the state is lower than the mean in the general population of college students in that state. (Note that the fact that most of the plausible values for \u03bc fall below 480 is not a consideration here.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"f5d6bc2e92734e0d9ff7ce1659ec4114\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"f94118dc1e2b4b34a509356fdd6045ea\">Recall that in the formula for the 95% confidence interval for \u03bc,\u00a0[latex]\\bar{x}\\pm 2*\\frac{\\sigma }{\\sqrt{n}}[\/latex], the 2 comes from the Standard Deviation Rule, which says that any normal random variable (in our case\u00a0[latex]\\overline{X}[\/latex], has a 95% chance (or probability of 0.95) of taking a value that is within 2 standard deviations of its mean.<\/p>\n<p id=\"adfb8adcd0d14f9496cbfeb17e8e2870\">As you recall from the discussion about the normal random variable, this is only an approximation, and to be more accurate, there is a 95% chance that a normal random variable will take a value within 1.96 standard deviations of its mean. Therefore, a more accurate formula for the 95% confidence interval for \u03bc is [latex]\\bar{x}\\pm 1.96*\\frac{\\sigma }{\\sqrt{n}}[\/latex], which you\u2019ll find in most introductory statistics books. In this course, we\u2019ll use 2 (and not 1.96), which is close enough for our purposes.<\/p>\n<h2><span title=\"Quick scroll up\">Other Levels of Confidence<\/span><\/h2>\n<p id=\"N10B23\">The most commonly used level of confidence is 95%. However, we may wish to increase our level of confidence and produce an interval that is almost certain to contain \u03bc. Specifically, we may want to report an interval for which we are 99% confident\u2014rather than only 95% confident\u2014that it contains the unknown population mean.<\/p>\n<p id=\"N10B26\">Using the same reasoning as in the last comment, in order to create a 99% confidence interval for \u03bc, we should ask: There is a probability of 0.99 that any normal random variable takes values within how many standard deviations of its mean? The precise answer is 2.576, and therefore, a 99% confidence interval for \u03bc is [latex]\\bar{x}\\pm2.576*\\frac{\\sigma}{\\sqrt{n}}[\/latex].<\/p>\n<p id=\"N10B55\">Another commonly used level of confidence is a 90% level of confidence. Since there is a probability of 0.90 that any normal random variable takes values within 1.645 standard deviations of its mean, the 90% confidence interval for \u03bc is [latex]\\bar{x}\\pm1.645*\\frac{\\sigma}{\\sqrt{n}}[\/latex].<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10B86\">Let\u2019s go back to our first example, the IQ example:<\/p>\n<p id=\"N10B89\">The IQ level of students at a particular university has an unknown mean, \u03bc, and a known standard deviation,\u00a0\u03c3 = 15. A simple random sample of 100 students is found to have a sample mean IQ,\u00a0[latex]x = 115[\/latex]. Estimate \u03bc with 90%, 95%, and 99% confidence intervals.<\/p>\n<p id=\"N10BB3\">A 90% confidence interval for \u03bc is\u00a0[latex]\\bar{x}\\pm 1.645\\frac{\\sigma }{\\sqrt{n}}=115\\pm 1.645\\left ( \\frac{15}{\\sqrt{100}} \\right )=115\\pm 2.5=(112.5, 117.5)[\/latex]<\/p>\n<p id=\"N10C37\">A 95% confidence interval for \u03bc is\u00a0[latex]\\bar{x}\\pm 2\\frac{\\sigma }{\\sqrt{n}}=115\\pm 2\\left ( \\frac{15}{\\sqrt{100}} \\right )=115\\pm 3.0=(112, 118)[\/latex].<\/p>\n<p id=\"N10CA3\">A 99% confidence interval for \u03bc is\u00a0[latex]\\bar{x}\\pm2.576*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2.576\\left(\\frac{15}{\\sqrt{100}}\\right)=115\\pm 4.0=\\left(111,119\\right)[\/latex].<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-215\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-215\" class=\"h5p-iframe\" data-content-id=\"215\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Note from the previous example and the previous &#8220;Did I Get This?&#8221; activity, that the more confidence I require, the wider the confidence interval for \u03bc (pronounced and sometimes noted as &#8220;mu&#8221;). The 99% confidence interval is wider than the 95% confidence interval, which is wider than the 90% confidence interval.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A number line illustrating confidence intervals for \u03bc. x-bar is marked at 115. The interval 112.5 and 117.5 is the 90% confidence interval. Enclosing this interval is the interval 112 and 118, which is the 95% confidence interval. Even larger is the 99% confidence interval, ranging from 111 to 119.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image058.gif\" alt=\"A number line illustrating confidence intervals for \u03bc. x-bar is marked at 115. The interval 112.5 and 117.5 is the 90% confidence interval. Enclosing this interval is the interval 112 and 118, which is the 95% confidence interval. Even larger is the 99% confidence interval, ranging from 111 to 119.\" \/><\/span><\/span><\/p>\n<p id=\"N10D49\">This is not very surprising, given that in the 99% interval we multiply the standard deviation by 2.576, in the 95% by 2, and in the 90% only by 1.645. Beyond this numerical explanation, there is a very clear intuitive explanation and an important implication of this result.<\/p>\n<p id=\"N10D4C\">Let\u2019s start with the intuitive explanation. The more certain I want to be that the interval contains the value of \u03bc, the more plausible values the interval needs to include in order to account for that extra certainty. I am 95% certain that the value of \u03bc is one of the values in the interval (112,118). In order to be 99% certain that one of the values in the interval is the value of \u03bc, I need to include more values, and thus provide a wider confidence interval.<\/p>\n<\/div>\n<\/div>\n<p id=\"N10D53\">In our example, the\u00a0<em>wider<\/em>\u00a099% confidence interval (111, 119) gives us a\u00a0<em>less precise<\/em>\u00a0estimation about the value of \u03bc than the narrower 90% confidence interval (112.5, 117.5), because the smaller interval \u201cnarrows in\u201d on the plausible values of \u03bc.<\/p>\n<p id=\"N10D5C\">The important practical implication here is that researchers must decide whether they prefer to state their results with a higher level of confidence or produce a more precise interval. In other words,<\/p>\n<p id=\"N10D5F\"><em>There is a trade-off between the level of confidence and the precision with which the parameter is estimated<\/em>.<\/p>\n<p id=\"N10D65\">The price we have to pay for a higher level of confidence is that the unknown population mean will be estimated with less precision (i.e., with a wider confidence interval). If we would like to estimate \u03bc with more precision (i.e., a narrower confidence interval), we will need to sacrifice and report an interval with a lower level of confidence.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10D72\">In a recent study, 1,115 males 25 to 35 years of age were randomly chosen and asked about their exercise habits. Based on the study results, the researchers estimated the mean time that a male 25 to 35 years of age spends exercising with 90%, 95%, and 99% confidence intervals. These were (not necessarily in the same order):<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Three number lines illustrating three confidence intervals. The first is shows an interval of (3,4). The second, an interval of (2.5, 4.5), and the third, (2,5).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/_u5_m1_dig3a.gif\" alt=\"Three number lines illustrating three confidence intervals. The first is shows an interval of (3,4). The second, an interval of (2.5, 4.5), and the third, (2,5).\" \/><\/div>\n<div>\n<div id=\"h5p-216\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-216\" class=\"h5p-iframe\" data-content-id=\"216\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<hr \/>\n<p id=\"N10B04\">So far, we\u2019ve developed the confidence interval for the population mean from scratch, based on results from probability, and discussed the trade-off between the level of confidence and the precision of the interval. The price you pay for a higher level of confidence is a lower level of precision of the interval (i.e., a wider interval).<\/p>\n<p id=\"N10B07\">Is there a way to bypass this trade-off? In other words, is there a way to increase the precision of the interval (i.e., make it narrower)\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence? We will answer this question shortly, but first we need to get a deeper understanding of the different components of the confidence interval and its structure.<\/p>\n<div id=\"N10B0E\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Understanding the general structure of the confidence intervals<\/span><\/h2>\n<p id=\"N10B15\">We explored the confidence interval for \u03bc for different levels of confidence and found that, in general, it has the following form:<\/p>\n<p>[latex]\\bar{x}\\pm z^{*}\\cdot \\frac{\\sigma }{\\sqrt{n}}[\/latex],<\/p>\n<p id=\"N10B47\">where z* is a general notation for the multiplier that depends on the level of confidence. As we discussed before:<\/p>\n<p id=\"N10B4A\">For a 90% level of confidence, z* = 1.645<\/p>\n<p id=\"N10B4D\">For a 95% level of confidence, z* = 2 (or 1.96 if you want to be really precise)<\/p>\n<p id=\"N10B50\">For a 99% level of confidence, z* = 2.576<\/p>\n<p id=\"N10B53\">To start our discussion about the structure of the confidence interval, let\u2019s denote the [latex]z^{*}\\cdot\\frac{\\sigma }{\\sqrt{n}}[\/latex] formula by m.<\/p>\n<p id=\"N10B76\">The confidence interval, then, has the form:\u00a0[latex]\\bar{x}\\pm m[\/latex]:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A formula: x-bar \u00b1 z-star \u00d7 \u03c3\/\u221an Note that z-star \u00d7 \u03c3\/\u221an is m.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image062.gif\" alt=\"A formula: x-bar \u00b1 z-star \u00d7 \u03c3\/\u221an Note that z-star \u00d7 \u03c3\/\u221an is m.\" \/><\/span><\/span><\/p>\n<p id=\"N10B95\">[latex]\\overline{x}[\/latex]\u00a0is the sample mean, the point estimator for the unknown population mean (\u03bc).<\/p>\n<p><em>m<\/em>\u00a0is called the\u00a0<em class=\"italic\">margin of error<\/em>, since it represents the maximum estimation error for a given level of confidence.<\/p>\n<p id=\"N10BB2\">For example, for a 95% confidence interval, we are 95% sure that our estimate will not depart from the true population mean by more than m, the margin of error.<\/p>\n<p>m is further made up of the product of two components:<\/p>\n<p id=\"N10BB8\">z*, the confidence multiplier, and<\/p>\n<p>[latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex], which is the standard deviation of [latex]\\overline{x}[\/latex], the point estimator of \u03bc.<\/p>\n<p id=\"N10BE2\">Here is a summary of the different components of the confidence interval and its structure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"x-bar is the point estimator. It is either added to or subtracted by the margin of error (m). The margin of error is composed of the confidence multiplier, z-star, which is multiplied by the standard deviation of the point estimator, which is \u03c3\/\u221an .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image063.gif\" alt=\"x-bar is the point estimator. It is either added to or subtracted by the margin of error (m). The margin of error is composed of the confidence multiplier, z-star, which is multiplied by the standard deviation of the point estimator, which is \u03c3\/\u221an .\" \/><\/span><\/span><\/p>\n<p id=\"N10BEB\">This structure:<\/p>\n<p id=\"N10BEE\"><em><strong>estimate [latex]\\pm[\/latex] of error<\/strong><\/em><\/p>\n<p id=\"N10C11\">where the margin of error is further composed of the product of a confidence multiplier and the standard deviation (or, as we\u2019ll see, the standard error) is the general structure of all confidence intervals that we will encounter in this course.<\/p>\n<p>Obviously, even though each confidence interval has the same components, what these components actually are is different from confidence interval to confidence interval, depending on what unknown parameter the confidence interval aims to estimate.<\/p>\n<p id=\"N10C17\">Since the structure of the confidence interval is such that it has a margin of error on either side of the estimate, it is centered at the estimate (in our case,\u00a0[latex]\\overline{x}[\/latex]), and its width (or length) is exactly twice the margin of error:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"A number line, on which the estimate has been placed. To the left and to the right are two intervals with the size m. So, the confidence interval, which comprises of both margins of errors (the left one and right one) is of width 2m.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image065.gif\" alt=\"A number line, on which the estimate has been placed. To the left and to the right are two intervals with the size m. So, the confidence interval, which comprises of both margins of errors (the left one and right one) is of width 2m.\" \/><\/span><\/span><\/p>\n<p id=\"N10C30\">The margin of error, m, is therefore \u201cin charge\u201d of the width (or precision) of the confidence interval, and the estimate is in charge of its location (and has no effect on the width).<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-217\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-217\" class=\"h5p-iframe\" data-content-id=\"217\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"N10B0C\">Let us now go back to the confidence interval for the mean, and more specifically, to the question that we posed at the beginning of the previous page:<\/p>\n<p id=\"N10B0F\">Is there a way to increase the precision of the confidence interval (i.e., make it narrower)\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence?<\/p>\n<p id=\"N10B16\">Since the width of the confidence interval is a function of its margin of error, let\u2019s look closely at the margin of error of the confidence interval for the mean and see how it can be reduced:<\/p>\n<p>[latex]z^{*}\\cdot\\frac{\\sigma}{\\sqrt{n}}[\/latex]<\/p>\n<p id=\"N10B3C\">Since z* controls the level of confidence, we can rephrase our question above in the following way:<\/p>\n<p id=\"N10B3F\">Is there a way to reduce this margin of error other than by reducing z*?<\/p>\n<p id=\"N10B42\">If you look closely at the margin of error, you\u2019ll see that the answer is yes. We can do that by increasing the sample size n (since it appears in the denominator).<\/p>\n<p id=\"N10B49\">Let\u2019s look at an example first and then explain why increasing the sample size is a way to increase the precision of the confidence interval\u00a0<em class=\"italic\">without<\/em>\u00a0compromising on the level of confidence.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10B52\">Recall the IQ example:<\/p>\n<p>The IQ level of students at a particular university has an unknown mean (\u03bc) and a known standard deviation of \u03c3 = 15. A simple random sample of 100 students is found to have the sample mean IQ\u00a0 [latex]\\overline{x}=115[\/latex]. A 95% confidence interval for \u03bc in this case is:<br \/>\n[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2\\left(\\frac{15}{\\sqrt{100}}\\right)=115\\pm3.0=\\left(112,118\\right)[\/latex]<br \/>\nNote that the margin of error is m = 3, and therefore the width of the confidence interval is 6.<\/p>\n<p>Now, what if we change the problem slightly by increasing the sample size, and assume that it was 400 instead of 100?<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=400, represented by a smaller circle. In this sample, we find that x bar = 115.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image067.gif\" alt=\"A large circle represents the Population of all Students at SU. We are interested in the variable IQ, and the unknown parameter is \u03bc, the population mean IQ level. In addition, we know that \u03c3 = 15. From this population we create a sample of size n=400, represented by a smaller circle. In this sample, we find that x bar = 115.\" \/><\/span><\/span><\/p>\n<p>In this case, the 95% confidence interval for \u03bc is:<br \/>\n[latex]\\bar{x}\\pm2*\\frac{\\sigma}{\\sqrt{n}}=115\\pm2\\left(\\frac{15}{\\sqrt{400}}\\right)=115\\pm1.5=\\left(113.5,116.5\\right)[\/latex]<\/p>\n<p id=\"N10C72\">The margin of error here is only m = 1.5, and thus the width is only 3.<\/p>\n<p id=\"N10C75\">Note that for the same level of confidence (95%) we now have a narrower, and thus more precise, confidence interval.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"N10C79\">Let\u2019s try to understand why a larger sample size will reduce the margin of error for a fixed level of confidence. There are three ways to explain it: mathematically, using probability theory, and intuitively.<\/p>\n<p id=\"N10C7E\">We\u2019ve already alluded to the mathematical explanation; the margin of error is [latex]z^{*}\\cdot\\frac{\\sigma}{\\sqrt{n}}[\/latex], and since n, the sample size, appears in the denominator, increasing n will reduce the margin of error.<\/p>\n<p id=\"N10CA1\">As we saw in our discussion about point estimates, probability theory tells us that<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Two sampling distribution curves for x-bar. One is squished down and wider, while the other is much taller and narrower. Both curves share the same \u03bc. The tall, narrow distribution was based on a larger sample size, which has a smaller standard deviation, and so is less spread out. This means that values of x-bar are more likely to be closer to \u03bc when the sample size is larger.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image069.gif\" alt=\"Two sampling distribution curves for x-bar. One is squished down and wider, while the other is much taller and narrower. Both curves share the same \u03bc. The tall, narrow distribution was based on a larger sample size, which has a smaller standard deviation, and so is less spread out. This means that values of x-bar are more likely to be closer to \u03bc when the sample size is larger.\" \/><\/span><\/span><\/p>\n<p id=\"N10CAA\">This explains why with a larger sample size the margin of error (which represents how far apart we believe\u00a0[latex]\\overline{x}[\/latex]\u00a0might be from \u03bc for a given level of confidence) is smaller.<\/p>\n<p id=\"N10CBD\">On an intuitive level, if our estimate\u00a0[latex]\\overline{x}[\/latex]\u00a0is based on a larger sample (i.e., a larger fraction of the population), we have more faith in it, or it is more reliable, and therefore we need to account for less error around it.<\/p>\n<div id=\"N10CD0\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"N10CD7\">While it is true that for a given level of confidence, increasing the sample size increases the precision of our interval estimation, in practice, increasing the sample size is not always possible. Consider a study in which there is a non-negligible cost involved for collecting data from each participant (an expensive medical procedure, for example). If the study has some budgetary constraints, which is usually the case, increasing the sample size from 100 to 400 is just not possible in terms of cost-effectiveness. Another instance in which increasing the sample size is impossible is when a larger sample is simply not available, even if we had the money to afford it. For example, consider a study on the effectiveness of a drug on curing a very rare disease among children. Since the disease is rare, there are a limited number of children who could be participants. This is the reality of statistics. Sometimes theory collides with reality, and you just do the best you can.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-218\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-218\" class=\"h5p-iframe\" data-content-id=\"218\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10B1C\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Sample Size Calculations<\/span><\/h2>\n<p>As we just learned, for a given level of confidence, the sample size determines the size of the margin of error and thus the width, or precision, of our interval estimation. This process can be reversed.<\/p>\n<p>In situations where a researcher has some flexibility as to the sample size, the researcher can calculate in advance what the sample size is that he\/she needs in order to be able to report a confidence interval with a certain level of confidence and a certain margin of error. Let\u2019s look at an example.<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\"><span class=\"scnReader\">Example<\/span><\/div>\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B2B\">Recall the example about the SAT-M scores of community college students.<\/p>\n<p id=\"N10B2E\">An educational researcher is interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state. To this end, the researcher has chosen a random sample of 650 community college students from his state, and found that their average SAT-M score is 475. Based on a large body of research that was done on the SAT, it is known that the scores roughly follow a normal distribution, with the standard deviation\u00a0\u03c3 = 100.<\/p>\n<p>The 95% confidence interval for \u03bc is [latex]\\left(475-2*\\frac{100}{\\sqrt{650}},475+2\\frac{100}{\\sqrt{650}}\\right)[\/latex], which is roughly [latex]475\\pm8[\/latex] <span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">475<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00b1<\/span><\/span><span class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">8<\/span><\/span><\/span><\/span><\/span>, or (467,484). For a sample size of n = 650, our margin of error is 8.<\/p>\n<p id=\"N10B9A\">Now, let\u2019s think about this problem in a slightly different way:<\/p>\n<p id=\"N10B9D\">An educational researcher is interested in estimating \u03bc, the mean score on the math part of the SAT (SAT-M) of all community college students in his state with a margin of error of (only) 5, at the 95% confidence level. What is the sample size needed to achieve this? (\u03c3, of course, is still assumed to be 100).<\/p>\n<p id=\"N10BA0\">To solve this, we set:<\/p>\n<p>[latex]m=2\\bullet\\frac{100}{\\sqrt{n}}=5[\/latex]<\/p>\n<p>so<\/p>\n<p>[latex]\\sqrt{n}=\\frac{2\\left(100\\right)}{5}[\/latex]<\/p>\n<p id=\"N10BF5\">and<\/p>\n<p>[latex]n=\\left(\\frac{2\\left(100\\right)}{5}\\right)^2=1600[\/latex]<\/p>\n<p id=\"N10C2D\">So, for a sample size of 1,600 community college students, the researcher will be able to estimate \u03bc with a margin of error of 5, at the 95% level. In this example, we can also imagine that the researcher has some flexibility in choosing the sample size, since there is a minimal cost (if any) involved in recording students\u2019 SAT-M scores, and there are many more than 1,600 community college students in each state.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"N10C31\">Rather than take the same steps to isolate n every time we solve such a problem, we may obtain a general expression for the required n for a desired margin of error m and a certain level of confidence.<\/p>\n<p id=\"N10C34\">Since\u00a0[latex]m=z*\\frac{\\sigma }{\\sqrt{n}}[\/latex]\u00a0is the formula to determine m for a given n, we can use simple algebra to express n in terms of m (multiply both sides by the square root of n, divide both sides by m, and square both sides) to get<\/p>\n<p>[latex]\\mathcal{n}=\\left(\\frac{\\mathcal{z}*\\sigma}{\\mathcal{m}}\\right)^2[\/latex]<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10C8D\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"N10C94\">Clearly, the sample size n must be an integer. In the previous example we got n = 1,600, but in other situations, the calculation may give us a non-integer result. In these cases, we should always\u00a0<em>round up to the next highest integer.<\/em><\/p>\n<p id=\"N10C9A\">Using this \u201cconservative approach,\u201d we\u2019ll achieve an interval at least as narrow as the one desired.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10C9F\">IQ scores are known to vary normally with a standard deviation of 15. How many students should be sampled if we want to estimate the population mean IQ at 99% confidence with a margin of error equal to 2?<\/p>\n<p>[latex]\\mathcal{n}=\\left(\\frac{\\mathcal{z}*\\sigma}{\\mathcal{m}}\\right)^2=\\left(\\frac{2.576\\left(15\\right)}{2}\\right)^2=373.26[\/latex]<\/p>\n<p id=\"N10D13\">Round up to be safe, and take a sample of 374 students.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-219\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-219\" class=\"h5p-iframe\" data-content-id=\"219\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10D3B\" class=\"section\">\n<div class=\"sectionContain\">\n<h2>Comment<\/h2>\n<p id=\"N10D42\">In the preceding activity, you saw that in order to calculate the sample size when planning a study, you needed to know the population standard deviation, sigma (\u03c3). In practice, sigma is usually not known, because it is a parameter. (The rare exceptions are certain variables like IQ score or standardized tests that might be constructed to have a particular known sigma.)<\/p>\n<p id=\"N10D44\">Therefore, when researchers wish to compute the required sample size in preparation for a study, they use an\u00a0<em>estimate<\/em>\u00a0of sigma. Usually, sigma is estimated based on the standard deviation obtained in prior studies.<\/p>\n<p>However, in some cases, there might not be any prior studies on the topic. In such instances, a researcher still needs to get a rough estimate of the standard deviation of the (yet-to-be-measured) variable, in order to determine the required sample size for the study. One way to get such a rough estimate is with the &#8220;range rule of thumb,&#8221; which you will practice in the following activity.<\/p>\n<\/div>\n<\/div>\n<p id=\"N10D50\">The purpose of the next activity is to give you some experience with a method for roughly estimating sigma (\u03c3, the population standard deviation) when no prior studies are available, in order to compute sample size when planning a first study.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-220\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-220\" class=\"h5p-iframe\" data-content-id=\"220\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Learn by doing 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"e926c636a7d64fab852a5ea6fc0aff31\">We are almost done with this section. We need to discuss just a few more questions:<\/p>\n<ul id=\"a4aa87d73dec4341b6c29d5ac0ddbf3c\">\n<li>\n<p id=\"af1141518e8dd4d1aa31b63d2ff0c6e5b\">Is it always okay to use the confidence interval we developed for \u03bc when \u03c3 is known?<\/p>\n<\/li>\n<li>\n<p id=\"adda3bccbe99242e7a0133ca2ec830923\">What if \u03c3 is unknown?<\/p>\n<\/li>\n<li>\n<p id=\"af820d6216c0b41c19fac8607140577a0\">How can we use statistical software to calculate confidence intervals for us?<\/p>\n<\/li>\n<\/ul>\n<div id=\"d8d1339cce934ab290950d7904aafaed\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">When Is It Safe to Use the Confidence Interval We Developed?<\/span><\/h2>\n<p id=\"fc23cfe2184a419daac2566fd2ca33a8\">One of the most important things to learn with any inference method is the conditions under which it is safe to use it. It is very tempting to apply a certain method, but if the conditions under which this method was developed are not met, then using this method will lead to unreliable results, which can then lead to wrong and\/or misleading conclusions. As you\u2019ll see throughout this section, we always discuss the conditions under which each method can be safely used.<\/p>\n<p id=\"d7086d37e1d94ef8ab60eb11c491543e\">In particular, the confidence interval for \u03bc (when \u03c3 is known),\u00a0[latex]\\bar{x}\\pm z^{*}*\\frac{\\sigma }{\\sqrt{n}}[\/latex], was developed assuming that the sampling distribution of\u00a0[latex]\\overline{X}[\/latex]\u00a0is normal; in other words, that the Central Limit Theorem applies. In particular, this allowed us to determine the values of z*, the confidence multiplier, for different levels of confidence.<\/p>\n<p id=\"cb460886354b47ab85b316cf50c88388\">First,\u00a0<em class=\"italic\">the sample must be random.<\/em>\u00a0Assuming that the sample is random, recall from the Probability unit that the Central Limit Theorem works when the\u00a0<em class=\"italic\">sample size is large<\/em>\u00a0(a common rule of thumb for \u201clarge\u201d is n &gt; 30), or, for\u00a0<em class=\"italic\">smaller sample sizes<\/em>, if it is known that the quantitative\u00a0<em class=\"italic\">variable<\/em>\u00a0of interest is\u00a0<em class=\"italic\">distributed normally<\/em>\u00a0in the population. The only situation in which we cannot use the confidence interval, then, is when the sample size is small and the variable of interest is not known to have a normal distribution. In that case, other methods, called nonparametric methods, which are beyond the scope of this course, need to be used. This can be summarized in the following table:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a5c5bd93cef74b7786c187d516525057\" class=\"img-responsive popimg aligncenter\" title=\"A table with two columns and two rows. The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size.&amp;quot; The row headings are &amp;quot;Variable varies normally&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally, Small sample size: OK; Variable varies normally, Large sample size: OK; Variable doesn&amp;apos;t vary normally, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally, Large sample size: OK;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image077.gif\" alt=\"A table with two columns and two rows. The column headings are: &amp;quot;Small Sample Size&amp;quot; and &amp;quot;Large Sample Size.&amp;quot; The row headings are &amp;quot;Variable varies normally&amp;quot; and &amp;quot;Variable doesn&amp;apos;t vary normally.&amp;quot; Here is the data in the table by cell in &amp;quot;Row, Column: Value&amp;quot; format: Variable varies normally, Small sample size: OK; Variable varies normally, Large sample size: OK; Variable doesn&amp;apos;t vary normally, Small sample size: NOT OK; Variable doesn&amp;apos;t vary normally, Large sample size: OK;\" \/><\/span><\/span><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10075\">Below are four different situations in which a confidence interval for \u03bc is called for.<\/p>\n<p id=\"N10077\"><em>Situation A:<\/em>\u00a0In order to estimate \u03bc, the mean annual salary of high-school teachers in a certain state, a random sample of 150 teachers was chosen and their average salary was found to be $38,450. From past experience, it is known that teachers&#8217; salaries have a standard deviation of $5,000.<\/p>\n<p id=\"N1007B\"><em>Situation B:<\/em>\u00a0A medical researcher wanted to estimate \u03bc, the mean recovery time from open-heart surgery for males between the ages of 50 and 60. The researcher followed the next 15 male patients in this age group who underwent open-heart surgery in his medical institute through their recovery period. (Comment: Even though the sample was not strictly random, there is no reason to believe that the sample of &#8220;the next 15 patients&#8221; introduces any bias, so it is as good as a random sample). The mean recovery time of the 15 patients was 26 days. From the large body of research that was done in this area, it is assumed that recovery times from open-heart surgery have a standard deviation of 3 days.<\/p>\n<p id=\"N1007F\"><em>Situation C:<\/em>\u00a0In order to estimate \u03bc, the mean score on the quantitative reasoning part of the GRE (Graduate Record Examination) of all MBA students, a random sample of 1,200 MBA students was chosen, and their scores were recorded. The sample mean was found to be 590. It is known that the quantitative reasoning scores on the GRE vary normally with a standard deviation of 150.<\/p>\n<p id=\"N10083\"><em>Situation D:<\/em>\u00a0A psychologist wanted to estimate \u03bc, the mean time it takes 6-year-old children diagnosed with Down&#8217;s Syndrome to complete a certain cognitive task. A random sample of 12 children was chosen and their times were recorded. The average time it took the 12 children to complete the task was 7.5 minutes. From past experience with similar tasks, the time is known to vary normally with a standard deviation of 1.3 minutes.<\/p>\n<div id=\"h5p-221\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-221\" class=\"h5p-iframe\" data-content-id=\"221\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this 4\"><\/iframe><\/div>\n<\/div>\n<p id=\"a7f4998a72bd467d92bcd656a24f923c\">Below are four different situations in which a confidence interval formula would be useful:<\/p>\n<p id=\"f01945b34ce545b2b959d7a44f891b02\"><em class=\"italic\">Situation A:<\/em>\u00a0A marketing executive wants to estimate the average time, in days, that a watch battery will last. She tests 50 randomly selected batteries and finds that the distribution is skewed to the left, since a couple of the batteries were defective. It is known from past experience that the standard deviation is 25 days.<\/p>\n<p id=\"cb9235a3672a4769a0a6600f3f561f6a\"><em class=\"italic\">Situation B:<\/em>\u00a0A college professor desires an estimate of the mean number of hours per week that full-time college students are employed. He randomly selected 250 college students and found that they worked a mean time of 18.6 hours per week. He uses previously known data for his standard deviation.<\/p>\n<p id=\"cfaa1036e2ae4e59bbd0ca3ce1fe9ed6\"><em class=\"italic\">Situation C:<\/em>\u00a0A medical researcher at a sports medicine clinic uses 35 volunteers from the clinic to study the average number of hours the typical American exercises per week. It is known that hours of exercise are normally distributed and past data give him a standard deviation of 1.2 hours.<\/p>\n<p id=\"d4f614ad35a14c0393ea2371914072f6\"><em class=\"italic\">Situation D:<\/em>\u00a0A high-end auto manufacturer tests 5 randomly selected cars to find out the damage caused by a 5 mph crash. It is known that this distribution is normal. Assume that the standard deviation is known.<\/p>\n<div class=\"asx\">\n<div id=\"du4_m1_confintmean7_digt_tutor1\" class=\"activitywrap sectionNest flash\">\n<div class=\"activityhead\">\n<div class=\"activityinfo\"><\/div>\n<\/div>\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m1_confintmean7_digt_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m1_confintmean7_digt_tutor1\" class=\"page 2963021\">\n<div id=\"2963021\" class=\"question\">\n<div id=\"h5p-222\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-222\" class=\"h5p-iframe\" data-content-id=\"222\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Did I get this 5\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div>\n<div class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<p id=\"e1b9734a819a407cb80b4d44ac889fe5\"><em class=\"italic\">What if\u00a0<\/em>\u03c3\u00a0<em class=\"italic\">is unknown?<\/em><\/p>\n<p id=\"b40d6877b5894422aff7930d7c9009ac\">As we discussed earlier, when variables have been well-researched in different populations it is reasonable to assume that the population standard deviation (\u03c3) is known. However, this is rarely the case. What if \u03c3 is unknown?<\/p>\n<p id=\"babe4722279c4920ad165c821861bb4d\">Well, there is some good news and some bad news.<\/p>\n<p id=\"ef8d299b51b04e039ea1c6ef56d5bb22\">The good news is that we can easily replace the population standard deviation, \u03c3, with the\u00a0<em class=\"italic\">sample<\/em>\u00a0standard deviation, s.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b9c36bf450c14920a7ba79c4f46caaca\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image078.gif\" alt=\"A large circle represents the population of interest. \u03bc is unknown and \u03c3 is unknown. From the population we create a SRS of size n, represented by a smaller circle. We can find x-bar for this SRS, and we can also obtain S. We use this instead of the unknown \u03c3.\" \/><\/span><\/span><\/p>\n<p id=\"ad28be4f62ac44ed9bbd2f403825ce3b\">The bad news is that once \u03c3 has been replaced by s, we lose the Central Limit Theorem, together with the normality of\u00a0[latex]\\overline{X}[\/latex], and therefore the confidence multipliers z* for the different levels of confidence (1.645, 2, 2.576) are (generally) not accurate any more. The new multipliers come from a different distribution called the \u201ct distribution\u201d and are therefore denoted by t* (instead of z*). We will discuss the t distribution in more detail when we talk about hypothesis testing.<\/p>\n<p id=\"f233d0f9df3b4c859e7901b9ebb2522d\">The confidence interval for the population mean (\u03bc) when (\u03c3) is unknown is therefore:<\/p>\n<p>[latex]\\bar{x}\\pm t^**\\frac{s}{\\sqrt{n}}[\/latex]<\/p>\n<p id=\"c8a0354558694dbc8e8c642f1cb53495\">(Note that this interval is very similar to the one when \u03c3 is known, with the obvious changes: s replaces \u03c3, and t* replaces z* as discussed above.)<\/p>\n<p id=\"c59fa88aeb2b425c9213365feffb0efa\">There is an important difference between the confidence multipliers we have used so far (z*) and those needed for the case when \u03c3 is unknown (t*). Unlike the confidence multipliers we have used so far (z*), which depend only on the level of confidence, the new multipliers (t*) have the\u00a0<em class=\"italic\">added complexity<\/em>\u00a0that they\u00a0<em class=\"italic\">depend on both the level of confidence and on the sample size<\/em>\u00a0(for example, the t* used in a 95% confidence when n = 10 is different from the t* used when n = 40). Due to this added complexity in determining the appropriate t*, we will rely heavily on software in this case.<\/p>\n<\/div>\n<\/div>\n<div id=\"aa8aab1bd3934f93b4266e13c4a063ea\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\n<ol>\n<li id=\"f820c0b50aac4393a0a083de956bdfe0\">Since it is quite rare that \u03c3 is known, this interval (sometimes called a <em class=\"italic\">one-sample t confidence interval<\/em>) is more commonly used as the confidence interval for estimating \u03bc. (Nevertheless, we could not have presented it without our extended discussion up to this point, which also provided you with a solid understanding of confidence intervals.)<\/li>\n<li id=\"af53d45f69d94a018c1959cf48e1d25c\">The quantity [latex]\\frac{s}{\\sqrt{n}}[\/latex] is called the\u00a0<em class=\"italic\">standard error<\/em>\u00a0of\u00a0[latex]\\overline{X}[\/latex]. The central limit theorem tells us that [latex]\\frac{\\sigma }{\\sqrt{n}}[\/latex] is the\u00a0<em class=\"italic\">standard deviation<\/em>\u00a0of\u00a0[latex]\\overline{X}[\/latex] (and this is the quantity used in confidence interval when \u03c3 is known). In general, whenever we replace parameters with their sample counterparts in the standard deviation of a statistic, the resulting quantity is called the standard error of the statistic. In this case, we replaced \u03c3 with its sample counterpart (s), and thus\u00a0[latex]\\frac{s}{\\sqrt{n}}[\/latex]\u00a0is the\u00a0<em class=\"italic\">standard error<\/em>\u00a0of (the statistic)\u00a0[latex]\\overline{X}[\/latex].<\/li>\n<li id=\"ad7a9bdd565a4dde97fa7fe4dce87f04\">As before, to safely use this confidence interval, the sample <em class=\"italic\">must be random<\/em>, and the only case when this interval cannot be used is when the sample size is small and the variable is not known to vary normally.<\/li>\n<\/ol>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-223\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-223\" class=\"h5p-iframe\" data-content-id=\"223\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"9.4 Learn by doing 4\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Final comment<\/span><\/h2>\n<p id=\"dcbcf0c7629e4233bde8eae1030c35a2\">It turns out that for large values of n, the t* multipliers are not that different from the z* multipliers, and therefore using the interval formula:<\/p>\n<p>[latex]\\bar{x}\\pm z^{*}*\\frac{s}{\\sqrt{n}}[\/latex]<\/p>\n<p id=\"c25ac0eb8c9f4bf59941bd2147d71f6f\">for \u03bc when \u03c3 is unknown provides a pretty good approximation.<\/p>\n<h2><span title=\"Quick scroll up\">Let\u2019s summarize<\/span><\/h2>\n<p id=\"cc9aef71106649219eaf7b3bc4d55c98\">* When the population is normal and\/or the sample is large, a confidence interval for unknown population mean \u03bc when \u03c3 is known is:<\/p>\n<p>[latex]\\bar{x}\\pm z*\\frac{\\sigma }{\\sqrt{n}}[\/latex], where z* is 1.645 for 90% confidence, 2 for 95% confidence, and 2.576 for 99% confidence.<\/p>\n<p id=\"b69b41268d704e14bb370b2a7703b028\">* There is a trade-off between the level of confidence and the precision of the interval estimation. The price we have to pay for more precision is sacrificing level of confidence.<\/p>\n<p id=\"a99e5aa935b2475daf1190048ceb301f\">* The general form of confidence intervals is an estimate +\/- the margin of error (m). In this case, the estimate=<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00af<\/span><\/span><\/span><span class=\"mjx-op\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">x<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span> and\u00a0[latex]m=z*\\frac{\\sigma }{\\sqrt{n}}[\/latex]. The confidence interval is therefore centered at the estimate and its width is exactly 2m.<\/p>\n<p id=\"e048e0b2133e4c3a8809c51146e2d544\">* For a given level of confidence, the width of the interval depends on the sample size. We can therefore do a sample size calculation to figure out what sample size is needed in order to get a confidence interval with a desired margin of error m, and a certain level of confidence (assuming we have some flexibility with the sample size). To do the sample size calculation we use:<\/p>\n<p>[latex]n=\\left(\\frac{z^*\\sigma}{m}\\right)^2[\/latex]<\/p>\n<p id=\"f6889a0cf34643efac5a7abc4533b138\">(and round\u00a0<em class=\"italic\">up<\/em>\u00a0to the next integer).<\/p>\n<p id=\"c3ca044d4d624a1e984ee1cc715d182f\">* When \u03c3 is unknown, we use the sample standard deviation, s, instead, but as a result we also need to use a different set of confidence multipliers (t*) associated with the t distribution. The interval is therefore<\/p>\n<p>[latex]x\\pm t^{*}*\\frac{s}{\\sqrt{n}}[\/latex]<\/p>\n<p id=\"c8e19115a4114af5a8ecaf5f1f0dafe5\">* These new multipliers have the added complexity that they depend not only on the level of confidence, but also on the sample size. Software is therefore very useful for calculating confidence intervals in this case.<\/p>\n<p id=\"cc6aac5cc77e495eacf7746ce1751300\">* For large values of n, the t* multipliers are not that different from the z* multipliers, and therefore using the interval formula:<\/p>\n<p>[latex]x\\pm z^{*}*\\frac{s}{\\sqrt{n}}[\/latex]<\/p>\n<p id=\"b7897adc0eb14a86a50b80d73c120f21\">for \u03bc when \u03c3 is unknown provides a pretty good approximation.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":11,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-569","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":421,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/569","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":40,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/569\/revisions"}],"predecessor-version":[{"id":1104,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/569\/revisions\/1104"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/421"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/569\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=569"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=569"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=569"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=569"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}