{"id":557,"date":"2024-10-18T02:40:09","date_gmt":"2024-10-18T02:40:09","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=557"},"modified":"2025-01-10T17:51:48","modified_gmt":"2025-01-10T17:51:48","slug":"8-2-hypothesis-tests-for-proportions","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/8-2-hypothesis-tests-for-proportions\/","title":{"raw":"8.2: Hypothesis Tests for Proportions","rendered":"8.2: Hypothesis Tests for Proportions"},"content":{"raw":"<div id=\"lobjh\" class=\"\">\r\n<div class=\"textbox textbox--learning-objectives\"><header class=\"textbox__header\">\r\n<h2 class=\"textbox__title\">Learning Objectives<\/h2>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ul>\r\n \t<li id=\"specify_hypotheses\">In a given context, specify the null and alternative hypotheses for the population proportion and mean.<\/li>\r\n \t<li id=\"carry_out_hypothesis_testing\">Carry out hypothesis testing for the population proportion and mean (when appropriate), and draw conclusions in context.<\/li>\r\n \t<li id=\"apply_concepts\">Apply the concepts of: sample size, statistical significance vs. practical importance, and the relationship between hypothesis testing and confidence intervals.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"a3c0a6255aaa4086beefe1d80d59db02\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Overview<\/span><\/h2>\r\n<p id=\"e76b43d3b24e4947b36877ae688e7bf9\">Now that we understand the process we go through in hypothesis testing and the logic behind it, we are ready to start learning about specific statistical tests (also known as significance tests).<\/p>\r\n<p id=\"ebc6ca81c302463eb6c9bb014971df57\">The first test we are going to learn is the test about the population proportion (p). This is test is widely known as the\u00a0<em class=\"italic\">z-test for the population proportion (p).<\/em>\u00a0(We will understand later where the \u201cz-test\u201d part comes from.)<\/p>\r\n<p id=\"eaffff59cb6e4825a4c13e3e6b65f0b4\">When we conduct a test about a population proportion, we are working with a categorical variable. Later in the course, after we have learned a variety of hypothesis tests, we will need to be able to identify which test is appropriate for which situation. Identifying the variable as categorical or quantitative is an important component of choosing an appropriate hypothesis test.<\/p>\r\n\r\n<div class=\"asx\">\r\n<div id=\"du4_m2_testprop1_tutor1\" class=\"activitywrap purpose learnbydoing flash\">\r\n<div class=\"activityhead\">\r\n<div class=\"purposeType purposelearnbydoing\" title=\"\"><span class=\"scnReader\">learn by doing<\/span><\/div>\r\n<\/div>\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop1_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop1_tutor1\" class=\"page 271710 2962796 2962797 2962798 2962799\">\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"activity flash\">\r\n<div class=\"flash_obj asx testFlash mark_flash\">\r\n<div class=\"page 271710 2962796 2962797 2962798 2962799\">\r\n<div><\/div>\r\n<div id=\"2962798\" class=\"question ddfb\">\r\n<div>\r\n<p id=\"N10090\">[h5p id=\"161\"]<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"bb62411327c348b0b2e4d9fff00047f6\">Our discussion of hypothesis testing for the population proportion p follows the four steps of hypotheses testing that we introduced in our general discussion on hypothesis testing, but this time we go into more details. More specifically, we learn how the test statistic and p-value are calculated and interpreted.<\/p>\r\n<p id=\"e7d8247936434ba4842f20b9bc23faf9\">Once we learn how to carry out the test for the population proportion p, we discuss some general topics that are related to hypotheses testing. More specifically, we see what role the sample size plays and understand how hypothesis testing and interval estimation (confidence intervals) are related.<\/p>\r\n<p id=\"a00493797c6c48c089285e46c2893c4e\">Let\u2019s start by introducing the three examples, which will be the leading examples in our discussion. Each example is followed by a figure illustrating the information provided, as well as the question of interest.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"fb54f09b895242408f0dda3a5d7f87bb\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"dbd0f3a5a2714b26b474a1a602817bac\">A machine is known to produce 20% defective products, and is therefore sent for repair. After the machine is repaired, 400 products produced by the machine are chosen at random and 64 of them are found to be defective. Do the data provide enough evidence that the proportion of defective products produced by the machine (p) has been\u00a0<em class=\"italic\">reduced<\/em>\u00a0as a result of the repair?<\/p>\r\n<p id=\"df6fa0d7c2704114a0c8d8a663986876\">The following figure displays the information, as well as the question of interest:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"afee653830e14a3ca4e2825ecef97bd1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image212.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective.\" \/><\/span><\/span>\r\n<p id=\"a0b0e5c58e724149a8ba352b2b6d6a27\">The question of interest helps us formulate the null and alternative hypotheses in terms of p, the proportion of defective products produced by the machine following the repair:<\/p>\r\n<p id=\"c832df819f7d4eb8b63136557d0bca07\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\r\n<p id=\"e82d35e3d6ba49ae89504dd54a189c2a\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"a426f1c3ebfd4078981577a8cdd3ece3\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"bf2f9e1d23564014bf8143bdf8cd591b\">There are rumors that students at a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that in a simple random sample of 100 students from the college, 19 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\r\n<p id=\"fbf6710943db465e934ef1ef790df49b\">Again, the following figure displays the information as well as the question of interest:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"df34bf92bc44434cac8551a0b7fa1816\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image213.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana.\" \/><\/span><\/span>\r\n<p id=\"fe08ff72850e47de8d19a10a7e775083\">As before, we can formulate the null and alternative hypotheses in terms of p, the proportion of students in the college who use marijuana:<\/p>\r\n<p id=\"ff7aaca85e87423ba460a3223e3998fb\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (same as among all college students in the country).<\/p>\r\n<p id=\"a3333c60cb12413e945769d30503afac\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (higher than the national figure).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"b9265c113f964dcdaf4d9b79dd242682\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"e70b10b4fb554b97935df11e0a2503c2\">Polls on certain topics are conducted routinely in order to monitor changes in the public\u2019s opinions over time. One such topic is the death penalty. In 2003 a poll estimated that 64% of U.S. adults support the death penalty for a person convicted of murder. In a more recent poll, 675 out of 1,000 U.S. adults chosen at random were in favor of the death penalty for convicted murderers. Do the results of this poll provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers (p) <em class=\"italic\">changed <\/em>between 2003 and the later poll?<\/p>\r\n<p id=\"d3dc7a96486d4953a5d5e5431f533071\">Here is a figure that displays the information, as well as the question of interest:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b15bcf532b0145cbbb8cf4f2228f4d70\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image214.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor.\" \/><\/span><\/span>\r\n<p id=\"d63555cf2b914c3da96b849aaebd11a9\">Again, we can formulate the null and alternative hypotheses in term of p, the proportion of U.S. adults who support the death penalty for convicted murderers.<\/p>\r\n<p id=\"af6d8eac990e41259f67b9117b1c8e9d\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\r\n<p id=\"c158fa05ebca488db1915f422cd61715\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nAccording to the American Association of Community Colleges, 23% of community college students receive federal grants. The California Community College Chancellor\u2019s Office anticipates that the percentage is smaller for California community college students. They collect a sample of 1,000 community college students in California and find that 210 received federal grants.\r\n\r\n[h5p id=\"162\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nUsing data from 2008, the American Association of Community Colleges (AACC) reports that community college students constitute 46% of all U.S. undergraduates. Given the downturn in the U.S. economy, the AACC anticipates an increase in this percentage for 2010. A poll of 500 randomly chosen undergraduates taken in 2010 indicates that 52% are attending a community college.\r\n\r\n[h5p id=\"163\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"b88721ac15ab446bbf37500bff8394bf\">Recall that there are basically 4 steps in the process of hypothesis testing:<\/p>\r\n<p id=\"d239382347e845389ba491a7ec80aa9f\">1. State the null and alternative hypotheses.<\/p>\r\n<p id=\"d0aa1929bd8244fe99ffda6559663c9a\">2. Collect relevant data from a random sample and summarize them (using a test statistic).<\/p>\r\n<p id=\"bc5c8d3b03464a089c091575d8306317\">3. Find the p-value, the probability of observing data like those observed assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n<p id=\"fe7d1129b0b0463fbf2acb03edb5564c\">4. Based on the p-value, decide whether we have enough evidence to reject H<sub>o<\/sub>\u00a0(and accept H<sub>a<\/sub>), and draw our conclusions in context.<\/p>\r\n<p id=\"ebad0df85d604d10984c71888e87427b\">We are now going to go through these steps as they apply to the hypothesis testing for the population proportion p. It should be noted that even though the details will be specific to this particular test, some of the ideas that we will add apply to hypothesis testing in general.<\/p>\r\n\r\n<div id=\"c825f2c0acc64131b1b4816ca0288d1e\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">1. Stating the Hypotheses<\/span><\/h2>\r\n<p id=\"c3dd31be633149f08e5b33574fd00035\">Here again are the three set of hypotheses that are being tested in each of our three examples:<\/p>\r\n\r\n<div id=\"fbf3d95b57cd45d0b74254247d1b5824\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"be9358131b9445c49c06b43d6a1b4543\">Has the proportion of defective products been reduced as a result of the repair?<\/p>\r\n<p id=\"d9d5e0fc474547bea7491be03d78aeac\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\r\n<p id=\"ecb5fa65497840f3a84597d08186be9b\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"b4fc1a1179fe4e02a2aac9f4284062b5\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"c7e1c9c4c72a4e238320bea0d984a607\">Is the proportion of marijuana users in the college higher than the national figure?<\/p>\r\n<p id=\"e21353fdbec74a118a3ea61c2d510e67\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (Same as among all college students in the country).<\/p>\r\n<p id=\"fd7e524dba2149c7bb4a72774fc1f117\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (Higher than the national figure).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"cc29a6e7218e401fa61b7e97114626f7\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"e11c01be33cc482192cb32e6edca2cbc\">Did the proportion of U.S. adults who support the death penalty changebetween 2003 and a later poll?<\/p>\r\n<p id=\"ae136a2a0dbb477bbbf6a6fc8bed05a1\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\r\n<p id=\"fd6900d28a564ff6bfa28000c381bb14\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"ec7de58bbd6b4f8b9e42fc9d9e17591f\">Note that the null hypothesis always takes the form:<\/p>\r\n<p id=\"aec73d9a03cd40658caaf90d500cb68b\">H<sub>o<\/sub>: p = some value<\/p>\r\n<p id=\"ea20157d02ce4b4bad892cc8a5cde326\">and the alternative hypothesis takes one of the following three forms:<\/p>\r\n<p id=\"c8d10433033e4243a57c2e994e18c42a\">H<sub>a<\/sub>: p &lt; that value (like in example 1)\u00a0<em class=\"italic\">or<\/em><\/p>\r\n<p id=\"f16e5cc3e8d4454d856f4bd28cccbce4\">H<sub>a<\/sub>: p &gt; that value (like in example 2)\u00a0<em class=\"italic\">or<\/em><\/p>\r\n<p id=\"e909cc7daa6240de9003c269dbcfbeca\">H<sub>a<\/sub>: p \u2260 that value (like in example 3).<\/p>\r\n<p id=\"a6c80a97c1a74ebdb9a28fea8133cbd3\">Note that it was quite clear from the context which form of the alternative hypothesis would be appropriate. The value that is specified in the null hypothesis is called the\u00a0<em class=\"italic\">null value<\/em>, and is generally denoted by p<sub>o<\/sub>. We can say, therefore, that in general the null hypothesis about the population proportion (p) would take the form:<\/p>\r\n<p id=\"fa509c65b5d84c478b1a10f7efb84711\">H<sub>o<\/sub>: p = p<sub>o<\/sub><\/p>\r\n<p id=\"a1ee5e6b2a5c494ca191f4b2cf7e2138\">We write H<sub>o<\/sub>: p = p<sub>o<\/sub>\u00a0to say that we are making the hypothesis that the population proportion has the value of p<sub>o<\/sub>. In other words, p is the unknown population proportion and p<sub>o<\/sub>\u00a0is the number we think p might be for the given situation.<\/p>\r\n<p id=\"aaad80f916a14128a7982a540607935f\">The alternative hypothesis takes one of the following three forms (depending on the context):<\/p>\r\n<p id=\"e3f35d1a20964cca89e4fd1773c65686\">H<sub>a<\/sub>: p &lt; p<sub>o<\/sub><em class=\"italic\">(one-sided)<\/em><\/p>\r\n<p id=\"bde1c812903142ed874ea3a5e52f660d\">H<sub>a<\/sub>: p &gt; p<sub>o<\/sub><em class=\"italic\">(one-sided)<\/em><\/p>\r\n<p id=\"b24ddfd044204a02ad9e36c3d13654f9\">H<sub>a<\/sub>: p \u2260 p<sub>o<\/sub><em class=\"italic\">(two-sided)<\/em><\/p>\r\n<p id=\"def808dd167840b391efe741051b1ef8\">The first two possible forms of the alternatives (where the = sign in H<sub>o<\/sub>\u00a0is challenged by &lt; or &gt;) are called\u00a0<em class=\"italic\">one-sided alternatives<\/em>, and the third form of alternative (where the = sign in H<sub>o<\/sub>\u00a0is challenged by \u2260) is called a<em class=\"italic\">two-sided alternative.<\/em>\u00a0To understand the intuition behind these names let\u2019s go back to our examples.<\/p>\r\n<p id=\"c4167172cdb546ca86bc431ef41ec41b\">Example 3 (death penalty) is a case where we have a two-sided alternative:<\/p>\r\n<p id=\"a5a9dc1974c24055b7a52d3e7235fb31\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\r\n<p id=\"e3f430512abb4171a72975f8ed93eb1a\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\r\n<p id=\"e986b2c250714e16ba232d9be949218b\">In this case, in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>\u00a0we will need to get a sample proportion of death penalty supporters which is very different from .64\u00a0<em class=\"italic\">in either direction,<\/em>\u00a0either much larger or much smaller than .64.<\/p>\r\n<p id=\"e67f5e2294f14c13bdad42972b2fab41\">In example 2 (marijuana use) we have a one-sided alternative:<\/p>\r\n<p id=\"aa82464505bf45f8b875e8a1882cbc50\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (Same as among all college students in the country).<\/p>\r\n<p id=\"a52c39ddf41941b686bbe00605407041\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (Higher than the national figure).<\/p>\r\n<p id=\"ca5871f8af784ef5b6b8e9d80e5ce51c\">Here, in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>\u00a0we will need to get a sample proportion of marijuana users which is much\u00a0<em class=\"italic\">higher<\/em>\u00a0than .157.<\/p>\r\n<p id=\"d1040674318448c3ba3c9f72393b4a66\">Similarly, in example 1 (defective products), where we are testing:<\/p>\r\n<p id=\"ed77295908ea46c4a167b56c10286b0f\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\r\n<p id=\"bf52650efd4d486cb16eb307bd4c3caf\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\r\n<p id=\"fd5e6e882cbd4e8f880a853df0dd958f\">in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>, we will need to get a sample proportion of defective products which is much\u00a0<em class=\"italic\">smaller<\/em>\u00a0than .20.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"da1f1d09db494d5f8d62557911a36238\">In each of the following examples, a test for the population proportion (p) is called for. You are asked to select the right null and alternative hypotheses.<\/p>\r\n<p id=\"e7d5064fc10148c18f2ff168751cd53f\"><em class=\"italic\">Scenario 1:\u00a0<\/em>The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. Has that figure changed since? To test this, a random sample of 100 Internet users was chosen, and when interviewed, 10 said that they were extremely worried about credit card fraud when buying online. Let p be the proportion of all Internet users who are concerned about credit card fraud.<\/p>\r\n[h5p id=\"164\"]\r\n\r\n<em class=\"italic\">Scenario 2:\u00a0<\/em>The UCLA Internet Report (February 2003) estimated that a proportion of roughly .75 of online homes are still using dial-up access, but claimed that the use of dial-up is declining. Is that really the case? To examine this, a follow-up study was conducted a year later in which out of a random sample of 1,308 households that had Internet access, 804 were connecting using a dial-up modem. Let p be the proportion of all U.S. Internet-using households that have dial-up access.\r\n\r\n[h5p id=\"165\"]\r\n\r\n<em class=\"italic\">Scenario 3:\u00a0<\/em>According to the UCLA Internet Report (February 2003) the use of the Internet at home is growing steadily and it is estimated that roughly 59.3% of households in the United States have Internet access at home. Has that trend continued since the report was released? To study this, a random sample of 1,200 households from a big metropolitan area was chosen for a more recent study, and it was found that 972 had an Internet connection. Let p be the proportion of U.S. households that have internet access.\r\n\r\n[h5p id=\"166\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"ee1138e762374c23b62954f9649fa8f6\">In each of the following examples, a test for the population proportion (p) is called for. You are asked to select the right null and alternative hypotheses.<\/p>\r\n<p id=\"b88cca7da3174f5fb92196603651922c\"><em class=\"italic\">Scenario 1:<\/em>\u00a0When shirts are made, there can occasionally be defects (such as improper stitching). But too many such defective shirts can be a sign of substandard manufacturing.<\/p>\r\n<p id=\"eba4fc456d9c43ccb9a9fbee2338d257\">Suppose, in the past, your favorite department store has had only one defective shirt per 200 shirts (a prior defective rate of only .005). But you suspect that the store has recently switched to a substandard manufacturer. So you decide to test to see if their overall proportion of defective shirts today is higher.<\/p>\r\n<p id=\"f8fa317ba4334f25829c1ff80dc0563d\">Suppose that, in a random sample of 200 shirts from the store, you find that 27 of them are defective, for a sample proportion of defective shirts of .135. You want to test whether this is evidence that the store is \"guilty\" of substandard manufacturing, compared to their prior rate of defective shirts.<\/p>\r\n[h5p id=\"167\"]\r\n<p id=\"d9c5388913c5427b95c1e3db243d178f\"><em class=\"italic\">Scenario 2:<\/em>\u00a0It is a known medical fact that just slightly fewer females than males are born (although the reasons are not completely understood); the known \"proper\" baseline female birthrate is about 49% females.<\/p>\r\n<p id=\"ead7f3b3339f4d458acfd785df6438a8\">In some cultures, male children are traditionally looked on more favorably than female children, and there is concern that the increasing availability of ultrasound may lead to pregnant mothers deciding to abort the fetus if it\u2019s not the culturally \"desired\" gender. If this is happening, then the proportion of females in those nations would be significantly lower than the proper baseline rate.<\/p>\r\n<p id=\"e718a98a9fe24641bc5f223eeaec5329\">To test whether the proportion of females born in India is lower than the proper baseline female birthrate, a study investigates a random sample of 6,500 births from hospital files in India, and finds 44.8% females born among the sample.<\/p>\r\n[h5p id=\"168\"]\r\n<p id=\"e3c852487f2f42f29c4f5f2b5a0b74f2\"><em class=\"italic\">Scenario 3:<\/em>\u00a0A properly-balanced 6-sided game die should give a 1 in exactly 1\/6 (16.7%) of all rolls. A casino wants to test its game die. If the die is not properly balanced one way or another, it could give either too many 1\u2019s or too few 1\u2019s, either of which could be bad.<\/p>\r\n<p id=\"de34e7a6eb3f49c0affc22dbf227b33d\">The casino wants to use the proportion of 1\u2019s to test whether the die is out of balance. So the casino test-rolls the die 60 times and gets a 1 in 9 of the rolls (15%).<\/p>\r\n[h5p id=\"169\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">2. Collecting and Summarizing the Data (Using a Test Statistic)<\/span><\/h2>\r\n<p id=\"adc378129b3b43e9b1a258185671a280\">After the hypotheses have been stated, the next step is to obtain a\u00a0<em class=\"italic\">sample<\/em>\u00a0(on which the inference will be based),\u00a0<em class=\"italic\">collect relevant data<\/em>, and\u00a0<em class=\"italic\">summarize<\/em>\u00a0them.<\/p>\r\n<p id=\"b19bec55934f4bada7dda9ecfa0f6f2a\">It is extremely important that our sample is representative of the population about which we want to draw conclusions. This is ensured when the sample is chosen at\u00a0<em class=\"italic\">random.<\/em>\u00a0Beyond the practical issue of ensuring representativeness, choosing a random sample has theoretical importance that we will mention later.<\/p>\r\n<p id=\"caeda4d28b7c413cb57fede3867d0a4e\">In the case of hypothesis testing for the population proportion (p), we will collect data on the relevant categorical variable from the individuals in the sample and start by calculating the sample proportion,\u00a0[latex]\\hat{p}[\/latex]\u00a0(the natural quantity to calculate when the parameter of interest is p).<\/p>\r\n<p id=\"efab1f8afd8946f18246144980e7cd0b\">Let\u2019s go back to our three examples and add this step to our figures.<\/p>\r\n\r\n<div id=\"cb64c0710e294728960efcbb9f5db270\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div><span class=\"imagewrap\"><span class=\"image\"><img id=\"d6eee5c7dccd496fa22b11bb430adfa3\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image221.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16\" \/><\/span><\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"fca2b1f104444259821e20b81c6c79c9\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div><span class=\"imagewrap\"><span class=\"image\"><img id=\"cada7271b18a4ea6abafc85c4f8b64c3\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image222.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19\" \/><\/span><\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"e84c10da140942b297cfeeb579122d16\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div><span class=\"imagewrap\"><span class=\"image\"><img id=\"d7e870431bc34392b3c9effbe73ebe6f\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image223.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675\" \/><\/span><\/span><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"c48234d8f94b4219bc3a8fccdda9ec06\">As we mentioned earlier without going into details, when we summarize the data in hypothesis testing, we go a step beyond calculating the sample statistic and summarize the data with a\u00a0<em class=\"italic\">test statistic<\/em>. Every test has a test statistic, which to some degree captures the essence of the test. In fact, the p-value, which so far we have looked upon as \u201cthe king\u201d (in the sense that everything is determined by it), is actually determined by (or derived from) the test statistic. We will now gradually introduce the test statistic.<\/p>\r\n<p id=\"cbe59fac565d4242bfc2e0bb36c2b657\">The test statistic is\u00a0<em class=\"italic\">a measure<\/em>\u00a0of how far the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0is from the null value\u00a0p<sub>0<\/sub>, the value that the null hypothesis claims is the value of p. In other words, since\u00a0[latex]\\hat{p}[\/latex]\u00a0is what the data estimates p to be, the test statistic can be viewed as a measure of the \u201cdistance\u201d between what the data tells us about p and what the null hypothesis claims p to be.<\/p>\r\n<p id=\"aa17be4cdf5e447fb1b7693c4264f862\">Let\u2019s use our examples to understand this:<\/p>\r\n\r\n<div id=\"d51808214b4840e59d40b7ccd690f880\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"ec6ed2b745e34d5f897b0cad26770e66\">The parameter of interest is p, the proportion of defective products following the repair.<\/p>\r\n<p id=\"b2b2acbe3ebd4bbeb58c0f701b0f4320\">The data estimate p to be\u00a0[latex]\\hat{p}=.16[\/latex]<\/p>\r\n<p id=\"af0389fe4da14dab94c6488d22e0ac66\">The null hypothesis claims that p = .20<\/p>\r\n<p id=\"e43c8b5aad1e47179f58687b7000b2e6\">The data are therefore .04 (or 4 percentage points) below the null hypothesis with respect to what they each tell us about p.<\/p>\r\n<p id=\"fe02f60b7f1f457da29d64c92e747170\">It is hard to evaluate whether this difference of 4% in defective products is enough evidence to say that the repair was effective, but clearly, the larger the difference, the more evidence it is against the null hypothesis. So if, for example, our sample proportion of defective products had been, say, .10 instead of .16, then I think you would all agree that cutting the proportion of defective products in half (from 20% to 10%) would be extremely strong evidence that the repair was effective.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"e12cfe88fb7c4e88b023fdc3217f0671\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"c13163a9bbff41b0801fb544c6d25450\">The parameter of interest is p, the proportion of students in a college who use marijuana.<\/p>\r\n<p id=\"aada2df0a98d421badf65c62642c3400\">The data estimate p to be\u00a0[latex]\\hat{p}=.19[\/latex].<\/p>\r\n<p id=\"ac527ef74ba94e659c609d2a76860825\">The null hypothesis claims that p = .157<\/p>\r\n<p id=\"df6f8935e8244b18a73b7d0ffa885217\">The data are therefore .033 (or 3.3 percentage points) above the null hypothesis with respect to what they each tell us about p.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"ebd97f453f8f4bdc9764fd9dea274bc8\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"cd3c3e78f8474807ac11dd2bce9872ed\">The parameter of interest is p, the proportion of U.S. adults who support the death penalty for convicted murderers.<\/p>\r\n<p id=\"cd1921ba82ef4c9bb74052e9f1c50c26\">The data estimate p to be\u00a0[latex]\\hat{p}=.675[\/latex]<\/p>\r\n<p id=\"cb647ce7f2264fd68df5c21e2a56954a\">The null hypothesis claims that p = .64.<\/p>\r\n<p id=\"c99c4f21bcec4e689b961c3e3f89ff25\">There is a difference of .035 (3.5 percentage points) between the data and the null hypothesis with respect to what they each tell us about p.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"b2042fbcc29d416cab4e2f35d01035bb\">There is a problem with just looking at the difference between the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0and the null value\u00a0p<sub>0<\/sub>.<\/p>\r\n<p id=\"d7e7b48c540e46fa981950b02f8502bc\">Examples 2 and 3 illustrate this problem very well.<\/p>\r\n<p id=\"a746195cf1814d06977f5babc6a1a8d6\">In example 2 we have a difference of 3.3 percentage points between the data and the null hypothesis, which is approximately the same as the difference in example 3 of 3.5 percentage points. However, the difference in example 3 of 3.5 percentage points is based on a\u00a0<em class=\"italic\">sample of size of 1,000<\/em>\u00a0and therefore it is much\u00a0<em class=\"italic\">more impressive<\/em>\u00a0than the difference of 3.3 percentage points in example 2, which was obtained from a sample of size of only 100.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"170\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"a0dd297f213748b3b764c2373e4a42aa\">For the reason illustrated in the examples at the end of the previous page, the test statistic cannot simply be the difference\u00a0[latex]\\hat{p}-p_{0}[\/latex], but must be some form of that formula that accounts for the sample size. In other words, we need to somehow standardize the difference\u00a0[latex]\\hat{p}-p_{0}[\/latex]\u00a0so that comparison between different situations will be possible. We are very close to revealing the test statistic, but before we construct it, let\u2019s be reminded of the following two facts from probability:<\/p>\r\n<p id=\"d9965edfe551454abfbf42220620bc0b\">1. When we take a random sample of size n from a population with population proportion p, the possible values of the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0(when certain conditions are met) have approximately a normal distribution with:<\/p>\r\n<p id=\"faa922e36b664a22938f86f76d04f504\">* mean: p<\/p>\r\nstandard deviation:\u00a0[latex]\\sqrt{\\frac{\\mathcal{p}\\left(1-\\mathcal{p}\\right)}{\\mathcal{n}}}[\/latex]\r\n<p id=\"eccbc94de1c74c378be8497bd48dc427\">2. The z-score of a normal value (a value that comes from a normal distribution) is:<\/p>\r\n[latex]\\mathcal{z}=\\frac{value-mean}{standard\\ deviation}[\/latex]\r\n<p id=\"b7bb3568d36c4a11b48ecc58a58ebaf4\">and it represents how many standard deviations below or above the mean the value is.<\/p>\r\n<p id=\"fec1f4cf967042f18b9441e9e117f2b0\">We are finally ready to reveal the test statistic:<\/p>\r\n<p id=\"c51f2023a5334cada59f3592f6a7ca50\">The test statistic for this test measures the difference between the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0\u00a0and the null value\u00a0p<sub>0<\/sub>\u00a0by the z-score (standardized score) of the sample proportion\u00a0[latex]\\hat{p}[\/latex], assuming that the null hypothesis is true (i.e., assuming that\u00a0[latex]p-p_{0}[\/latex]).<\/p>\r\n<p id=\"a79b6af46d2c479ea4c5494f61e6bf66\">From fact 1, we know that the values of the sample proportion [latex]\\hat{p}[\/latex]\u00a0 are normal, and we are given the mean and standard deviation.<\/p>\r\n<p id=\"c3c4d57100ab45debadd839fb3e0b348\">Using fact 2, we conclude that the z-score of\u00a0[latex]\\hat{p}[\/latex]\u00a0when\u00a0[latex]\\hat{p}-p_{0}[\/latex]\u00a0is:<\/p>\r\n[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]\r\n<p id=\"aaad3420d435457aad88e35008c68bcc\"><em class=\"italic\">This is the test statistic.<\/em>\u00a0It represents the difference between the sample proportion ([latex]\\hat{p}[\/latex]) and the null value ([latex]p_{0}[\/latex]), measured in standard deviations.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bbfa1e0dd51a44e9a126a5d37e88694d\" class=\"img-responsive popimg aligncenter\" title=\"A normal curve representing samping distribution of p-hat assuming that p=p_0. Marked on the horizontal axis is p_0 and a particular value of p-hat. z is the difference between p-hat and p_0 measured in standard deviations (with the sign of z indicating whether p-hat is below or above p_0)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image237.gif\" alt=\"A normal curve representing samping distribution of p-hat assuming that p=p_0. Marked on the horizontal axis is p_0 and a particular value of p-hat. z is the difference between p-hat and p_0 measured in standard deviations (with the sign of z indicating whether p-hat is below or above p_0)\" \/><\/span><\/span>\r\n<p id=\"c5f1e7a14ea1485ab3b7bf305e8d801c\">Here is a representation of the sampling distribution of\u00a0<span id=\"MathJax-Element-16-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-146\" class=\"mjx-math\"><span id=\"MJXc-Node-147\" class=\"mjx-mrow\"><span id=\"MJXc-Node-148\" class=\"mjx-mrow\"><span id=\"MJXc-Node-149\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-151\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-150\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>, assuming p = p<sub>0<\/sub>. In other words, this is a model of how\u00a0<span id=\"MathJax-Element-17-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-152\" class=\"mjx-math\"><span id=\"MJXc-Node-153\" class=\"mjx-mrow\"><span id=\"MJXc-Node-154\" class=\"mjx-mrow\"><span id=\"MJXc-Node-155\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-157\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-156\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u2018s behave if we are drawing random samples from a population for which H<sub>0<\/sub>\u00a0is true. Notice the center of the sampling distribution is at p<sub>0<\/sub>, which is the hypothesized proportion given in the null hypothesis (H<sub>0<\/sub>: p = p<sub>0<\/sub>.) We could also mark the axis in standard deviation units,\u00a0[latex]\\sqrt{\\frac{\\mathcal{p}\\left(1-\\mathcal{p}\\right)}{\\mathcal{n}}}[\/latex]. For example, if our null hypothesis claims that the proportion of U.S. adults supporting the death penalty is 0.64, then the sampling distribution is drawn as if the null is true. We draw a normal distribution centered at p = 0.64 with a standard deviation dependent on sample size, [latex]\\sqrt{\\frac{0.64(1-0.64)}{\\mathcal{n}}}[\/latex].<\/p>\r\n\r\n<div id=\"bc9eff87f72c41ed979bf99005ac4c76\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Important Comment<\/span><\/h2>\r\n<p id=\"f0e0381cf06048aeade4d8b4e3091273\">Note that under the assumption that H<sub>0<\/sub>\u00a0is true (i.e.,\u00a0[latex]p=p_{0}[\/latex]), the test statistic, by the nature of the fact that it is a z-score, has N(0,1) (standard normal) distribution. Another way to say the same thing which is quite common is: \u201cThe null distribution of the test statistic is N(0,1).\u201d By \u201cnull distribution,\u201d we mean the distribution under the assumption that H<sub>0<\/sub>\u00a0is true. As we\u2019ll see and stress again later, the null distribution of the test statistic is what the calculation of the p-value is based on.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"e21a438e7b604cb58cc3db63a9e8bee5\">Let\u2019s go back to our three examples and find the test statistic in each case:<\/p>\r\n\r\n<div id=\"a1898c10c25c4490b1c846fc718cffab\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fe376f2393d1442abae7b9901f2cd2eb\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still 0.20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image238.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still 0.20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2.\" \/><\/span><\/span>\r\n<p id=\"d5e63f4ec5444a56b5a181f0ce762260\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.20, the standardized score of\u00a0<span id=\"MathJax-Element-21-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-203\" class=\"mjx-math\"><span id=\"MJXc-Node-204\" class=\"mjx-mrow\"><span id=\"MJXc-Node-205\" class=\"mjx-mrow\"><span id=\"MJXc-Node-206\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-208\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-207\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-209\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-210\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-211\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">16<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.16-.20}{\\sqrt{\\frac{.20\\left(1-.20\\right)}{400}}}=-2[\/latex]<\/p>\r\n<p id=\"cccaf387ae5c4e2798a67813f3bcfa48\">This is the value of the test statistic for this example.<\/p>\r\n<p id=\"b703630313bb43a2a81e3041240c269a\">What does this tell me?<\/p>\r\n<p id=\"e0ec45140d2b4a10b1bf0576e2c90341\">This z-score of \u22122 tells me that (assuming that H<sub>0<\/sub>\u00a0is true) the sample proportion\u00a0<span id=\"MathJax-Element-23-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-242\" class=\"mjx-math\"><span id=\"MJXc-Node-243\" class=\"mjx-mrow\"><span id=\"MJXc-Node-244\" class=\"mjx-mrow\"><span id=\"MJXc-Node-245\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-247\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-246\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-248\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-249\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-250\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">16<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is 2 standard deviations below the null value (0.20).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"d362897edcb748a69d7ee921aec0a8eb\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e7458122c2aa458f88586f110282d431\" class=\"img-responsive popimg\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, and z = 0.91\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image241.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, and z = 0.91\" \/><\/span><\/span>\r\n<p id=\"b40f0100985e44fd9885905b7057b7ac\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.157, the standardized score of\u00a0<span id=\"MathJax-Element-24-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-251\" class=\"mjx-math\"><span id=\"MJXc-Node-252\" class=\"mjx-mrow\"><span id=\"MJXc-Node-253\" class=\"mjx-mrow\"><span id=\"MJXc-Node-254\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-256\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-255\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-257\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-258\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-259\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">19<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.19-.157}{\\sqrt{\\frac{.157\\left(1-.157\\right)}{100}}}\\approx.91[\/latex].<\/p>\r\n<p id=\"b7f6ea63d42543babbf8609bacd758e0\">This is the value of the test statistic for this example.<\/p>\r\n<p id=\"f0d424974e994c3cb178ad300ac4010f\">We interpret this to mean that, assuming that H<sub>0<\/sub>\u00a0is true, the sample proportion\u00a0<span id=\"MathJax-Element-26-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-290\" class=\"mjx-math\"><span id=\"MJXc-Node-291\" class=\"mjx-mrow\"><span id=\"MJXc-Node-292\" class=\"mjx-mrow\"><span id=\"MJXc-Node-293\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-295\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-294\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-296\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-297\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-298\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">19<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is 0.91 standard deviations above the null value (0.157).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"bb48e79a3d5d4a669663acca7caa56d8\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a2b37da0c9ed42e3b036b15b76e651c7\" class=\"img-responsive popimg\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was 0.64)?&amp;quot; We take a sample of 1000 US adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, and z = 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image244.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was 0.64)?&amp;quot; We take a sample of 1000 US adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, and z = 2.31\" \/><\/span><\/span>\r\n<p id=\"b9cac5a4a24c4dc8bb353b73ebe8ac86\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.64, the standardized score of\u00a0[latex]\\hat{p}=.675[\/latex]\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.675-.64}{\\sqrt{\\frac{.64\\left(1-.64\\right)}{1000}}}\\approx2.31[\/latex].<\/p>\r\n<p id=\"bc67b26077bd4dc8b099b826961a7b50\">This is the value of the test statistic for this example.<\/p>\r\n<p id=\"b5f083ee64994674a33ada5ba2ecc05b\">We interpret this to mean that, assuming that H<sub>0<\/sub>\u00a0is true, the sample proportion\u00a0[latex]\\hat{p}=.675[\/latex]\u00a0is 2.31 standard deviations above the null value (0.64).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nWe think that the most common color for automobiles is silver and that 24% of all automobiles sold are silver. We take a random sample of 225 cars and find that 63 of them are silver.\r\n\r\n[h5p id=\"171\"]\r\n\r\nIf we take a different random sample and get a test statistic of zero, what can we conclude? Mark each statement as true or false.\r\n\r\n[h5p id=\"172\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Comments about the Test Statistic<\/span><\/h2>\r\n<p id=\"b880260189f949dba35350988036dbec\">1. We mentioned earlier that to some degree, the test statistic captures the essence of the test. In this case, the test statistic measures the difference between\u00a0<span id=\"MathJax-Element-30-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-348\" class=\"mjx-math\"><span id=\"MJXc-Node-349\" class=\"mjx-mrow\"><span id=\"MJXc-Node-350\" class=\"mjx-mrow\"><span id=\"MJXc-Node-351\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-353\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-352\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0and\u00a0<span id=\"MathJax-Element-31-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-354\" class=\"mjx-math\"><span id=\"MJXc-Node-355\" class=\"mjx-mrow\"><span id=\"MJXc-Node-356\" class=\"mjx-mrow\"><span id=\"MJXc-Node-357\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-358\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-359\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0in standard deviations. This is exactly what this test is about. Get data, and look at the discrepancy between what the data estimates p to be (represented by\u00a0<span id=\"MathJax-Element-32-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-360\" class=\"mjx-math\"><span id=\"MJXc-Node-361\" class=\"mjx-mrow\"><span id=\"MJXc-Node-362\" class=\"mjx-mrow\"><span id=\"MJXc-Node-363\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-365\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-364\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>) and what H<sub>0<\/sub>\u00a0claims about p (represented by\u00a0<span id=\"MathJax-Element-33-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-366\" class=\"mjx-math\"><span id=\"MJXc-Node-367\" class=\"mjx-mrow\"><span id=\"MJXc-Node-368\" class=\"mjx-mrow\"><span id=\"MJXc-Node-369\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-370\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-371\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>).<\/p>\r\n<p id=\"bd6fc1ea455449cb962d3496a751f275\">2. You can think about this test statistic as a measure of evidence in the data against H<sub>0<\/sub>. The larger the test statistic, the \u201cfurther the data are from H<sub>0<\/sub>\u201d and therefore the more evidence the data provide against H<sub>0<\/sub>.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"ab8b39f7c5674357aa5db9ace49d0bd0\">The UCLA Internet Report (February 2003) estimated that a proportion of roughly 0.75 of online homes are still using dial-up access, but claimed that the use of dial-up is declining. Is that really the case? To examine this, a follow-up study was conducted a year later in which, out of a random sample of 1,308 households that had Internet access, 804 were connecting using a dial-up modem.<\/p>\r\n<p id=\"bc62bc823a1d4e52ab2af40c87870515\">Let p be the proportion of all U.S. Internet-using households who have dial-up access. In the previous activity, we established that the appropriate hypotheses here are:<\/p>\r\n<p id=\"b096bc8aaf0b49428c7734fa66dc7b8f\">H<sub>0<\/sub>: p = 0.75 and H<sub>a<\/sub>: p &lt; 0.75<\/p>\r\n\r\n<div class=\"asx \">\r\n<div id=\"du4_m2_testprop4_tutor3\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"activityhead\">\r\n<div class=\"activityinfo\"><\/div>\r\n<\/div>\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop4_tutor3\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop4_tutor3\" class=\"page 2962884\">\r\n<div id=\"2962884\" class=\"question\">\r\n<div>Based on the data, what is the sample proportion of Internet households that use a dial-up connection?<\/div>\r\n<\/div>\r\n<div>[h5p id=\"173\"]<\/div>\r\n<div>[h5p id=\"174\"]<\/div>\r\n<div><\/div>\r\n<\/div>\r\n<\/div>\r\n<div>Ann and Sam are both testing the hypothesis that 40% of plain M&amp;M\u2019s are orange, H<sub>0<\/sub>: p = 0.40. Ann draws a sample of M&amp;M\u2019s and 45% of her sample are orange. She calculates a test statistic of z = 1.25. Sam draws a sample of M&amp;M\u2019s and 50% of his sample are orange. He calculates a test statistic of z = 1.<\/div>\r\n<div>What can we conclude? Mark each statement as true or false.<\/div>\r\n<div>[h5p id=\"175\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\r\n<ol>\r\n \t<li style=\"list-style-type: none\">\r\n<ol>\r\n \t<li>\r\n<p id=\"b7c0b845b8d24bea9e2a398e26c5be5d\">It should now be clear why this test is commonly known as\u00a0<em class=\"italic\">the z-test for the population proportion<\/em>. The name comes from the fact that it is based on a test statistic that is a\u00a0<em class=\"italic\">z-score.<\/em><\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"c61582f93643441c9bf0a647ee1c51ee\">Recall fact 1 that we used for constructing the z-test statistic. Here is part of it again:<\/p>\r\n<p id=\"d584b12403f44ff4836e94feb59df391\">When we take a\u00a0<em class=\"italic\">random<\/em>\u00a0sample of size n from a population with population proportion p, the possible values of the sample proportion ([latex]\\hat{p}[\/latex]) (<em class=\"italic\">when certain conditions are met<\/em>) have approximately a normal distribution with a mean of \u2026 and a standard deviation of \u2026.<\/p>\r\n<p id=\"c2244c33b8b74e8aa1609a584fef0013\">This result provides the theoretical justification for constructing the test statistic the way we did, and therefore the assumptions under which this result holds (in bold, above) are the conditions that our data need to satisfy so that we can use this test. These two conditions are:<\/p>\r\n\r\n<ol id=\"e9b9af0e22524967ac4bc7f27666e28b\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"f4df1bbb543e440f80a81d8b80ee4800\">The sample has to be random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ecc0ce2df03240818a375f6051ada188\">The conditions under which the sampling distribution of\u00a0[latex]\\hat{p}[\/latex]\u00a0is normal are met. In other words:<\/p>\r\n<\/li>\r\n<\/ol>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bf69f9e01c6a4afaa5fde0d2811fdf7d\" class=\"img-responsive popimg aligncenter\" title=\"n \u00d7 p_0 \u2265 10 and n \u00d7 (1 - p_0) \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image248.gif\" alt=\"n \u00d7 p_0 \u2265 10 and n \u00d7 (1 - p_0) \u2265 10\" \/><\/span><\/span><\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<ol id=\"ed9e074a3a4a4e2fad1211db9b4649de\">\r\n \t<li><span class=\"imagewrap\"><span class=\"image\">Here we will pause to say more about condition (i.) above, the need for a random sample. In the Probability Unit we discussed sampling plans based on probability (such as a simple random sample, cluster, or stratified sampling) that produce a non-biased sample, which can be safely used in order to make inferences about a population. We noted in the Probability Unit that, in practice, other (non-random) sampling techniques are sometimes used when random sampling is not feasible. It is important though, when these techniques are used, to be aware of the type of bias that they introduce, and thus the limitations of the conclusions that can be drawn from them.<\/span><\/span><\/li>\r\n<\/ol>\r\n<p id=\"fdad1b74ea8d4dac8a6c5e2a0a7f8368\">For our purpose here, we will focus on one such practice, the situation in which a sample is not really chosen randomly, but in the context of the categorical variable that is being studied, the sample is regarded as random. For example, say that you are interested in the proportion of students at a certain college who suffer from seasonal allergies. For that purpose, the students in a large engineering class could be considered as a random sample, since there is nothing about being in an engineering class that makes you more or less likely to suffer from seasonal allergies. Technically, the engineering class is a convenience sample, but it is treated as a random sample in the context of this categorical variable. On the other hand, if you are interested in the proportion of students in the college who have math anxiety, then the class of engineering students clearly could not be viewed as a random sample, since engineering students probably have a much lower incidence of math anxiety than the college population overall.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\nWe are conducting a survey to determine if an upcoming bond measure will receive a majority vote in the county. The null hypothesis claims that p = 0.50, where p is the proportion of registered voters in the county who say they support the bond measure.\r\n\r\n[h5p id=\"176\"]\r\n\r\nLet's check the conditions in our three examples.\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h4 class=\"textbox__title\">Example 1<\/h4>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ol id=\"ccb624427c0f4c778e312c5bf95d1d2f\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"cfda7bb5cc1d44e6abfa0ff059fbf01f\">The 400 products were chosen at random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"cedd5e481e1f493689a14abfc15004b3\">n = 400,\u00a0<span id=\"MathJax-Element-3-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-13\" class=\"mjx-math\"><span id=\"MJXc-Node-14\" class=\"mjx-mrow\"><span id=\"MJXc-Node-15\" class=\"mjx-mrow\"><span id=\"MJXc-Node-16\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-17\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-18\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-19\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-20\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-21\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\r\n<\/li>\r\n<\/ol>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b33c5e0598d5432589844f47e5b4a576\" class=\"img-responsive popimg aligncenter\" title=\" * n \u00d7 p_0 = 80 \u2265 10 * n \u00d7 (1 - p_0) = 320 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image250.gif\" alt=\" * n \u00d7 p_0 = 80 \u2265 10 * n \u00d7 (1 - p_0) = 320 \u2265 10\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h4 class=\"textbox__title\">Example 2<\/h4>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ol id=\"c733d14354894e20b4c2b73502e69a28\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"d7adcef89b4d4ef6af0c30ef26dc1b92\">The 100 students were chosen at random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"c476cb4d438046128c0263fb7628bb87\">n = 100,\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-22\" class=\"mjx-math\"><span id=\"MJXc-Node-23\" class=\"mjx-mrow\"><span id=\"MJXc-Node-24\" class=\"mjx-mrow\"><span id=\"MJXc-Node-25\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-26\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-27\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-28\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-29\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-30\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">157<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\r\n<\/li>\r\n<\/ol>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"c949fe1aa614458496ca8467b02d7bf0\" class=\"img-responsive popimg aligncenter\" title=\" * n \u00d7 p_0 = 15.7 \u2265 10 * n \u00d7 (1 - p_0) = 84.3 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image252.gif\" alt=\" * n \u00d7 p_0 = 15.7 \u2265 10 * n \u00d7 (1 - p_0) = 84.3 \u2265 10\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h4 class=\"textbox__title\">Example 3<\/h4>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<ol id=\"cea6e494e5eb4111b477a10c9d7a3735\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"c6c861ea23ae48d8985e42d08e6e588d\">The 1,000 U.S. adults were chosen at random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b0cffdf4cccf46858b0f9b0938514bdd\">n = 1,000,\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-31\" class=\"mjx-math\"><span id=\"MJXc-Node-32\" class=\"mjx-mrow\"><span id=\"MJXc-Node-33\" class=\"mjx-mrow\"><span id=\"MJXc-Node-34\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-35\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-36\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-37\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-38\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-39\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">64<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\r\n<\/li>\r\n<\/ol>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"adc052d024464c328c1ad1ba6c55ba21\" class=\"img-responsive popimg aligncenter\" title=\" * n \u00d7 p_0 = 640 \u2265 10 * n \u00d7 (1 - p_0) = 360 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image254.gif\" alt=\" * n \u00d7 p_0 = 640 \u2265 10 * n \u00d7 (1 - p_0) = 360 \u2265 10\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"b3c1c0895a3c48448db3da0f5fe3cee3\">In each of the following scenarios, you need to decide whether it is appropriate to use the z-test for the population proportion p, and if not, which condition is violated.<\/p>\r\n[h5p id=\"177\"]\r\n\r\n<\/div>\r\n<\/div>\r\nChecking that our data satisfy the conditions under which the test can be reliably used is a very important part of the hypothesis testing process. So far we haven\u2019t explicitly included it in the 4-step process of hypothesis testing, but now that we are discussing a specific test, you can see how it fits into the process. We are therefore now going to amend our 4-step process of hypothesis testing to include this extremely important part of the process.\r\n<div id=\"c0dd4d7b108f4541a1b10fb9dc99e326\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Four Steps in Hypothesis Testing<\/span><\/h2>\r\n<ol id=\"db37e2bc660248e6a3d1f63f26d4cf35\">\r\n \t<li>\r\n<p id=\"e379f2ed78cb4b09b9bae94d9fd4340f\">State the appropriate null and alternative hypotheses, H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"bd14c60503d349dabb31c3157373d576\">Obtain a random sample, collect relevant data, and\u00a0<em class=\"italic\">check whether the data meet the conditions under which the test can be used<\/em>. If the conditions are met, summarize the data using a test statistic.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d2f4eca67afe43de9129e459c5d1d23e\">Find the p-value of the test.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d60cac955f9543d08bae12e14d9b8d45\">Based on the p-value, decide whether or not the results are significant and\u00a0<em class=\"italic\">draw your conclusions in context.<\/em><\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"e8a9effc7c754ef6b9bf5c505174abb5\">With respect to the z-test, the population proportion that we are currently discussing:<\/p>\r\n<p id=\"b1550426a1984a8383c3a227b37480b9\">Step 1: Completed<\/p>\r\n<p id=\"b7b46e8455614f5d900183d824c2f49e\">Step 2: Completed<\/p>\r\n<p id=\"f2dd64b7d32c47e6b6a414b25ffc3e06\">Step 3: This is what we will work on next.<\/p>\r\n\r\n\r\n<hr \/>\r\n\r\n<div id=\"e1bbb3f1991e470383c9ac98867ac272\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">3. Finding the P-value of the Test<\/span><\/h2>\r\n<p id=\"ff9f322a67794311b756bf47ae9cef33\">So far we\u2019ve talked about the p-value at the intuitive level: understanding what it is (or what it measures) and how we use it to draw conclusions about the significance of our results. We will now go more deeply into how the p-value is calculated.<\/p>\r\n<p id=\"c3da15eff0eb4553a55a2f52fc88aca7\">It should be mentioned that eventually we will rely on technology to calculate the p-value for us (as well as the test statistic), but in order to make intelligent use of the output, it is important to first\u00a0<em class=\"italic\">understand<\/em>\u00a0the details, and only then let the computer do the calculations for us. Let\u2019s start.<\/p>\r\n<p id=\"fa64c70fb0dc4259af6bd90bd483c7b7\">Recall that so far we have said that the p-value is the probability of obtaining data like those observed assuming that H<sub>o<\/sub>\u00a0is true. Like the test statistic, the p-value is, therefore, a measure of the evidence against H<sub>o<\/sub>. In the case of the\u00a0<em class=\"italic\">test statistic,<\/em>\u00a0the\u00a0<em class=\"italic\">larger<\/em> it is in magnitude (positive or negative) , the further [latex]\\hat{p}[\/latex] is from <em><strong>p<sub>0<\/sub><\/strong><\/em>\u00a0, the\u00a0<em class=\"italic\">more evidence we have against H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">.\u00a0<\/em>In the case of the\u00a0<em class=\"italic\">p-value<\/em>, it is the opposite; the\u00a0<em class=\"italic\">smaller<\/em>\u00a0it is, the more unlikely it is to get data like those observed when H<sub>o<\/sub>\u00a0is true, the\u00a0<em class=\"italic\">more evidence it is against H<\/em><sub><em class=\"italic\">o<\/em><\/sub>. One can actually draw conclusions in hypothesis testing just using the test statistic, and as we\u2019ll see the p-value is, in a sense, just another way of looking at the test statistic. The reason that we actually take the extra step in this course and derive the p-value from the test statistic is that even though in this case (the test about the population proportion) and some other tests, the value of the test statistic has a very clear and intuitive interpretation, there are some tests where its value is not as easy to interpret. On the other hand, the p-value keeps its intuitive appeal across all statistical tests.<\/p>\r\n<p id=\"d10030a04d51418e867e4787ef1f43fe\"><em class=\"italic\">How is the p-value calculated?<\/em><\/p>\r\n<p id=\"dfe4effa5b6440fe935f40cb47f1a41a\">Intuitively, the p-value is the\u00a0<em class=\"italic\">probability<\/em>\u00a0of observing\u00a0<em class=\"italic\">data like those observed<\/em>\u00a0assuming that H<sub>o<\/sub>is true. Let\u2019s be a bit more formal:<\/p>\r\n\r\n<ul id=\"b844b1bd069e469d964a8efcd50f2cb1\">\r\n \t<li>\r\n<p id=\"e09aa657fb7f4d88bb27b374df321ab8\">Since this is a probability question about the\u00a0<em class=\"italic\">data<\/em>, it makes sense that the calculation will involve the data summary, the\u00a0<em class=\"italic\">test statistic.<\/em><\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"a95f87438dc94f779759d29a6dbafe58\">What do we mean by\u00a0<em class=\"italic\">\u201clike\u201d<\/em>\u00a0those observed? By \u201clike\u201d we mean\u00a0<em class=\"italic\">\u201cas extreme or even more extreme.\u201d<\/em><\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"bd23865caeea48a2afa430b4aaefd2bd\">Putting it all together, we get that in\u00a0<em class=\"italic\">general:<\/em><\/p>\r\n<p id=\"a59957028a7a4d8f9fc03a7494a23441\"><em class=\"italic\">The p-value is the probability of observing a test statistic as extreme as that observed (or even more extreme) assuming that the null hypothesis is true.<\/em><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"ca3c4d07c39f453e9aab825883ea5ccd\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"ae98732c1dc64f2abe0a4d24b961d9a4\">By\u00a0<em class=\"italic\">\u201cextreme\u201d<\/em>\u00a0we mean extreme\u00a0<em class=\"italic\">in the direction of the alternative<\/em>\u00a0hypothesis.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"c22952ba65914afc9c48481df4b3254b\"><em class=\"italic\">Specifically<\/em>, for the z-test for the population proportion:<\/p>\r\n\r\n<ol id=\"d525ad6b844b49009f80855371bf6c26\">\r\n \t<li>\r\n<p id=\"f6819d6f76b14971a23703943457a244\">If the alternative hypothesis is <em><strong>H<sub>a<\/sub> : p &lt; p<sub>0<\/sub><\/strong><\/em>\u00a0\u00a0(<em class=\"italic\">less<\/em>\u00a0than), then \u201cextreme\u201d means\u00a0<em class=\"italic\">small<\/em>, and the p-value is:<\/p>\r\n<p id=\"cb0ffa5351994515bf9320e5c6e8bc67\">The probability of observing a test statistic\u00a0<em class=\"italic\">as small as that observed or smaller<\/em>\u00a0if the null hypothesis is true.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"e17d3c401a304c138a8504a75ff5fb34\">If the alternative hypothesis is <em><strong>H<sub>a<\/sub> : p &gt; p<sub>0<\/sub><\/strong><\/em> (<em class=\"italic\">greater<\/em>\u00a0than), then \u201cextreme\u201d means\u00a0<em class=\"italic\">large<\/em>, and the p-value is:<\/p>\r\n<p id=\"d819367b7e3e40f8901d72d2c0982b9d\">The probability of observing a test statistic\u00a0<em class=\"italic\">as large as that observed or larger<\/em>\u00a0if the null hypothesis is true.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"f3b62e0ac469406d9b9a68c5f161a4cb\">if the alternative is\u00a0[latex]H_{a}:p\\neq p_{0}[\/latex]\u00a0(<em class=\"italic\">different<\/em>\u00a0from), then \u201cextreme\u201d means extreme in either direction\u00a0<em class=\"italic\">either small or large (i.e., large in magnitude)<\/em>, and the p-value therefore is:<\/p>\r\n<p id=\"e548ec3a0b874d8ea67466b123f47f99\">The probability of observing a test statistic\u00a0<em class=\"italic\">as large in magnitude as that observed or larger<\/em>\u00a0if the null hypothesis is true.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"a17cdf8a52284207b7e0bb86d80db43e\">(Examples: If z = -2.5: p-value = probability of observing a test statistic as small as -2.5 or smaller or as large as 2.5 or larger.<\/p>\r\n<p id=\"cdfefb7075114aa38bb46a0d00c8a1d8\">If z = 1.5: p-value = probability of observing a test statistic as large as 1.5 or larger, or as small as -1.5 or smaller.)<\/p>\r\n<p id=\"d30ad952fff94014a2fd975ca0416ab6\"><em class=\"italic\">OK, that makes sense. But how do we actually calculate it?<\/em><\/p>\r\n<p id=\"a431c2b26a80412c8c5f1784976d1e43\">Recall the important comment from our discussion about our test statistic,<\/p>\r\n[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]\r\n<p id=\"e8df62deee7a4c5281277dd5c37bb907\">which said that when the null hypothesis is true (i.e., when <em><strong>p = p<sub>0<\/sub><\/strong><\/em> ), the possible values of our test statistic (because it is a z-score) follow a standard normal (N(0,1), denoted by Z) distribution. Therefore, the p-value calculations (which assume that H<sub>o<\/sub>\u00a0is true) are simply standard normal distribution calculations for the 3 possible alternative hypotheses.<\/p>\r\n\r\n<div id=\"cb917aee58754a5cb3636aefb3294a0d\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Less Than<\/span><\/h2>\r\n<p id=\"df777795528345dba0210ed2e4428e54\">The probability of observing a test statistic as\u00a0<em class=\"italic\">small as that observed or smaller<\/em>, assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d0990c81beb04f838e356df68e500e1c\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p &amp;lt; p_0 \u21d2 p-value = P(Z \u2264 z)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image258.gif\" alt=\"Ha: p &amp;lt; p_0 \u21d2 p-value = P(Z \u2264 z)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"f3d076b51fb242d9b0b06eb29e9b968a\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the left of 0 because it is for a test statistic which is smaller than p_0. The p-value is the area to the left of z under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image259.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the left of 0 because it is for a test statistic which is smaller than p_0. The p-value is the area to the left of z under the curve.\" \/><\/span><\/span>\r\n<p id=\"ea6ab08581a74cf185efde2813135049\">Looking at the shaded region, you can see why this is often referred to as a\u00a0<em class=\"italic\">left-tailed<\/em>\u00a0test. We shaded to the left of the test statistic, since less than is to the left.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"d67cd76392ac4618bac7379972df6482\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Greater Than<\/span><\/h2>\r\n<p id=\"e0257e542a344125b2bd99d92c4f8b24\">The probability of observing a test statistic as\u00a0<em class=\"italic\">large as that observed or larger<\/em>, assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bbc3b4575bdf41b9a1a617658bb4d509\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p &amp;gt; p_0 \u21d2 p-value = P(Z \u2265 z)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image260.gif\" alt=\"Ha: p &amp;gt; p_0 \u21d2 p-value = P(Z \u2265 z)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"d92d63e7f69f4904bae4b6b212a8e34c\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the right of 0 because it is for a test statistic which is larger than p_0. The p-value is the area to the right of z under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image261.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the right of 0 because it is for a test statistic which is larger than p_0. The p-value is the area to the right of z under the curve.\" \/><\/span><\/span>\r\n<p id=\"bac6416321b4470f9f4de01f215df92e\">Looking at the shaded region, you can see why this is often referred to as a\u00a0<em class=\"italic\">right-tailed<\/em>\u00a0test. We shaded to the right of the test statistic, since greater than is to the right.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"e29acb0c945a4833b2d0137481005a12\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Not Equal To<\/span><\/h2>\r\n<p id=\"fe7971c638c943f6b4eea2b88a72e08c\">The probability of observing a test statistic which is as large as in\u00a0<em class=\"italic\">magnitude<\/em>\u00a0as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d5bdb03b1cff410ca0b5e945209a9add\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p \u2260 p_0 \u21d2 p-value = P(Z &amp;lt; |z|) + P(Z \u2265 |z|) = 2P(Z \u2265 |z|)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image262.gif\" alt=\"Ha: p \u2260 p_0 \u21d2 p-value = P(Z &amp;lt; |z|) + P(Z \u2265 |z|) = 2P(Z \u2265 |z|)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"cb76b68b6701448f91fc98563666056e\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0, -|z|, and |z|, where |z| and -|z| is the z-score of the observed test statistic. The p-value is the sum of the area to the right of |z| under the curve and the area to the left of -|z| under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image263.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0, -|z|, and |z|, where |z| and -|z| is the z-score of the observed test statistic. The p-value is the sum of the area to the right of |z| under the curve and the area to the left of -|z| under the curve.\" \/><\/span><\/span>\r\n<p id=\"e128aa7dba1747f5a40f5185299add84\">This is often referred to as a\u00a0<em class=\"italic\">two-tailed<\/em>\u00a0test, since we shaded in both directions.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"fc2e4d92fda74f04a8f608fde35ea1d9\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<p id=\"a0d08d023b7845ab833c8f4289df2a87\">As noted earlier, before the widespread use of statistical software, it was common to use \u2018critical values\u2019 instead of p-values to assess the evidence provided by the data. Even though the critical values approach is not used in this course, students might find it insightful. Thus, the interested students are encouraged to review the critical value method in the following \u201cMany Students Wonder\u2026.\u201d link. If your instructor clearly states that you are required to have knowledge of the critical value method, you should definitely review the information.<\/p>\r\n<p id=\"d5f2219cfc2345dfa72d9fa627d60a90\">On the next page, we will apply the p-value to our three examples. But first, work through the following activities, which should help your understanding.<\/p>\r\n\r\n<div id=\"f2f809197ba144f1b5ef16866d02876a\" class=\"section section-learnbydoing\">\r\n<div class=\"sectionContain\">\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"asx\">\r\n<div id=\"du4_m2_testprop6_tutor2\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop6_tutor2\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop6_tutor2\" class=\"page 2963160\">\r\n<div id=\"2963160\" class=\"question ddfb\">\r\n<div>\r\n<p id=\"N10076\">[h5p id=\"178\"]<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"asx\">\r\n<div id=\"du4_m2_testprop6_tutor3\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop6_tutor3\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop6_tutor3\" class=\"page 2963181\">\r\n<div id=\"2963181\" class=\"question ddfb\">\r\n<div>\r\n\r\nLet\u2019s return to the scenario where we are studying the population of part-time college students. We know that in 2008, 60% of this population was female. We are curious if the proportion has decreased this year. We test the hypotheses: H<sub>0<\/sub>: p = 0.60 and H<sub>a<\/sub>: p &lt; 0.60, where p is the proportion of part-time college students that are female this year.\r\n\r\n[h5p id=\"179\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"f7428372517f413e991d5cd1dc6c96ed\">In each of the following questions, choose the pair(s) of hypotheses for the population proportion (p) and the z statistic that match the figure.<\/p>\r\n\r\n<div class=\"asx \">\r\n<div id=\"du4_m2_testprop6_tutor5\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"activityhead\">\r\n<div class=\"activityinfo\"><\/div>\r\n<\/div>\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop6_tutor5\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop6_tutor5\" class=\"page 2963218 271724 2963219\">\r\n<div id=\"2963218\" class=\"question\">\r\n<div>\r\n\r\n<em>Question 1:<\/em>\r\n<div class=\"image shouldbeleft\"><img id=\"N10070\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p-value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14a_image1.jpg\" alt=\"histogram with p-value filled in\" \/><\/div>\r\n<div>[h5p id=\"180\"]<\/div>\r\n<div><\/div>\r\n<div>\r\n\r\n<em>Question 2:<\/em>\r\n<div class=\"image shouldbeleft\"><img id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p values filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14b_image1.jpg\" alt=\"histogram with p values filled in\" \/><\/div>\r\n<div>[h5p id=\"181\"]<\/div>\r\n<div><\/div>\r\n<div>\r\n\r\n<em>Question 3:<\/em>\r\n<div class=\"image shouldbeleft\"><img id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value plugged in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14c_image1.jpg\" alt=\"histogram with p value plugged in\" \/><\/div>\r\n<div>[h5p id=\"182\"]<\/div>\r\n<\/div>\r\n<div><\/div>\r\n<div>\r\n\r\n<em>Question 4:<\/em>\r\n<div class=\"image shouldbeleft\"><img id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14d_image1.jpg\" alt=\"histogram with p value filled in\" \/><\/div>\r\n<div>[h5p id=\"183\"]<\/div>\r\n<div><\/div>\r\n<div>\r\n\r\n<em>Question 5:<\/em>\r\n<div class=\"image shouldbeleft\"><img id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14e_image1.jpg\" alt=\"histogram with p value filled in\" \/><\/div>\r\n<div>[h5p id=\"184\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"sectionContain\">\r\n<div id=\"ebfb7c516109415d948bf3e5fa49e5d8\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a2869fb5f2af4d4cb70d6748ef6f020b\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &amp;lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image264.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &amp;lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2.\" \/><\/span><\/span>\r\n<p id=\"a63a95e3c35c42bfaf0691b627e449fe\">The p-value in this case is:<\/p>\r\n<p id=\"e5906b3baaaf42b8b1a2023d471873f2\">* The probability of observing a test statistic as small as -2 or smaller, assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n<p id=\"fca7105a2cbd4b23b0e93e4f5eb104af\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\r\n<p id=\"a4ac020c5924401482a3757b0f7d5539\">* The probability of observing a sample proportion that is 2 standard deviations or more below <em><strong>p<sub>0<\/sub> = .20<\/strong><\/em>, assuming that <em><strong>p<sub>0 <\/sub><\/strong><\/em>is the true population proportion.<\/p>\r\n<p id=\"df8ce4c2b7bf4962b0223dfe4298f7d3\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\r\n<p id=\"ab0b73b133b84ab0a0a5d735f2786782\">* The probability of observing a sample proportion of .16 or lower in a random sample of size 400, when the true population proportion is <em><strong>p<sub>0<\/sub> = .20<\/strong><\/em>.<\/p>\r\n<p id=\"ec99adbf019148bb9e9d41d734f41eef\">In either case, the p-value is found as shown in the following figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"df7d3f70bcbf4c608201aa1802a2d7fb\" class=\"img-responsive popimg aligncenter\" title=\"A normal N(0,1) curve. Marked on the horizontal axis are z-scores of 0 and -2. We are interested in the area to the left of -2, which is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image266.gif\" alt=\"A normal N(0,1) curve. Marked on the horizontal axis are z-scores of 0 and -2. We are interested in the area to the left of -2, which is the p-value.\" \/><\/span><\/span>\r\n<p id=\"d0fa3683eb53437bb9ced6335f8fb3f7\">To find\u00a0[latex]P(Z\\leq -2)[\/latex]\u00a0we can either use a table or software. Eventually, after we understand the details, we will use software to run the test for us and the output will give us all the information we need. The p-value that the statistical software provides for this specific example is 0.023. The p-value tells me that it is pretty unlikely (probability of .023) to get data like those observed (test statistic of -2 or less) assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"f595583cdd1e450b83dda29803c40ac3\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e0f2eca37d7a4fd9b768648e9117c76a\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, and z = .91\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image268.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, and z = .91\" \/><\/span><\/span>\r\n<p id=\"ff8d175956d24f10b6f040538e15ca29\">The p-value in this case is:<\/p>\r\n<p id=\"b0c649a511464ed398534341d68d83e2\">* The probability of observing a test statistic as large as .91 or larger, assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n<p id=\"d2177200ec6f41bd96ff58c72280a01d\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\r\n<p id=\"df97f7f04e174687b6ceb463f3f1f66d\">* The probability of observing a sample proportion that is .91 standard deviations or more above<em><strong> p<sub>0<\/sub> = .157<\/strong><\/em>, assuming that <em><strong>p<sub>0<\/sub><\/strong><\/em>\u00a0is the true population proportion.<\/p>\r\n<p id=\"bd91fc990515403bb76042e5137224b3\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\r\n<p id=\"f895853a80e744fea33ad21eb1d84736\">* The probability of observing a sample proportion of .19 or higher in a random sample of size 100, when the true population proportion is <em><strong> p<sub>0<\/sub> = .157<\/strong><\/em>.<\/p>\r\n<p id=\"dea71678f5dc4e40a076ecb6d8c50132\">In either case, the p-value is found as shown in the following figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b4f205e0678648cea22dda0709d5f793\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve for the sampling distribution. Marked on the horizontal axis are z-scores of 0 and .91 . The p-value is the area under the curve to the right of .91 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image270.gif\" alt=\"A N(0,1) curve for the sampling distribution. Marked on the horizontal axis are z-scores of 0 and .91 . The p-value is the area under the curve to the right of .91 .\" \/><\/span><\/span>\r\n<p id=\"cb25395649734b3fa388237ee0dca943\">Again, at this point we can either use a table or software to find that the p-value is 0.182.<\/p>\r\n<p id=\"ae4c3c3ea1274a3fa71edf9c5c9d1625\">The p-value tells us that it is not very surprising (probability of .182) to get data like those observed (which yield a test statistic of .91 or higher) assuming that the null hypothesis is true.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"efa2425e9af64d34864bbe647ca13b8a\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"efa3cf69adf648cba0251dd4e40542af\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, and z = 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image271.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, and z = 2.31\" \/><\/span><\/span>\r\n<p id=\"c84fe337625144f7be1e4147ad0e5701\">The p-value in this case is:<\/p>\r\n<p id=\"ee1327e8022c470391855d5afc926492\">* The probability of observing a test statistic as large as 2.31 (or larger) or as small as -2.31 (or smaller), assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n<p id=\"cd61654a968f4421afe55b336fa87169\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\r\n<p id=\"b2a06757e5564056a5413435f40802ca\">* The probability of observing a sample proportion that is 2.31 standard deviations or more away from <em><strong>p<sub>0<\/sub> = .64<\/strong><\/em>, assuming that <em><strong>p<sub>0<\/sub><\/strong><\/em> is the true population proportion.<\/p>\r\n<p id=\"ad05dcc349a247cfa433d3e7ac2ccae7\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\r\n<p id=\"f1bb56a0896a49f78c28f830d4dc1582\">* The probability of observing a sample proportion as different as .675 is from .64, or even more different (i.e. as high as .675 or higher or as low as .605 or lower) in a random sample of size 1,000, when the true population proportion is <em><strong>p<sub>0<\/sub> = .64<\/strong><\/em>.<\/p>\r\n<p id=\"abc0dfb58dd3478d9f0e9d4729b27297\">In either case, the p-value is found as shown in the following figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"aed13d933fdd4015a9069e5b012c0235\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) sampling distribution curve, with the z-scores -2.31, 0, and 2.31 marked on the horizontal axis. The p-value is the sum of the area under the curve to the left of -2.31 and the area under the curve to the right of 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image274.gif\" alt=\"A N(0,1) sampling distribution curve, with the z-scores -2.31, 0, and 2.31 marked on the horizontal axis. The p-value is the sum of the area under the curve to the left of -2.31 and the area under the curve to the right of 2.31\" \/><\/span><\/span>\r\n<p id=\"ddc51e50ac2b44f8ad7f5a6b386c03a2\">Again, at this point we can either use a table or software to find that the p-value is 0.021.<\/p>\r\n<p id=\"cd0d91186fc44e95ac48242d74850877\">The p-value tells us that it is pretty unlikely (probability of .021) to get data like those observed (test statistic as high as 2.31 or higher or as low as -2.31 or lower) assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"b73ae5f240084e35b339b6be875c984d\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"e61d4b9728f04f6fa50ec307aa870176\">We\u2019ve just seen that finding p-values involves probability calculations about the value of the test statistic assuming that H<sub>o<\/sub>\u00a0is true. In this case, when H<sub>o<\/sub>\u00a0is true, the values of the test statistic follow a standard normal distribution (i.e., the sampling distribution of the test statistic when the null hypothesis is true is N(0,1)). Therefore, p-values correspond to areas (probabilities) under the standard normal curve.<\/p>\r\n<p id=\"ea406fa3a28e4bf8b906d34db64f8c01\">Similarly, in\u00a0<em class=\"italic\">any test<\/em>, p-values are found using the sampling distribution of the test statistic when the null hypothesis is true (also known as the \u201cnull distribution\u201d of the test statistic). In this case, it was relatively easy to argue that the null distribution of our test statistic is N(0,1). As we\u2019ll see, in other tests, other distributions come up (like the t-distribution and the F-distribution), which we will just mention briefly, and rely heavily on the output of our statistical package for obtaining the p-values.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"b8bfb6512feb4cdebdb9ae44125dbd83\">We\u2019ve just completed our discussion about the p-value, and how it is calculated both in general and more specifically for the z-test for the population proportion. Let\u2019s go back to the four-step process of hypothesis testing and see what we\u2019ve covered and what still needs to be discussed.<\/p>\r\n\r\n<div id=\"b613155bdef5459fbab7f793866f10fb\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Four Steps in Hypothesis Testing<\/span><\/h2>\r\n<ol id=\"c953f8e4073e4834a34684ad0b55a21e\">\r\n \t<li>\r\n<p id=\"ec2f65a550fe4e9eb9100de46bf2e880\">State the appropriate null and alternative hypotheses, H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"f9afb2dd56c6402b8a489ca12fdb1ef4\">Obtain a random sample, collect relevant data, and\u00a0<em class=\"italic\">check whether the data meet the conditions under which the test can be used.<\/em>\u00a0If the conditions are met, summarize the data using a test statistic.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"f9a379463de44ef0ad3d75b936427a70\">Find the p-value of the test.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"db2f452c95b544bba06febb3489477f0\">Based on the p-value, decide whether or not the results are significant, and\u00a0<em class=\"italic\">draw your conclusions in context.<\/em><\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"d898ff3c00d64528a801c12011ef9406\">With respect to the z-test the population proportion:<\/p>\r\n<p id=\"a6de07e042484a08b6a5603d3b43ddab\">Step 1: Completed<\/p>\r\n<p id=\"db79461b02a3461ea5433d8fe98a28c3\">Step 2: Completed<\/p>\r\n<p id=\"be3bf3b447bf48858bc5030cd95309db\">Step 3: Completed<\/p>\r\n<p id=\"d30f78a9a76948a1880f96a37954e14e\">Step 4: This is what we will work on next.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"a4c47efe7a9e461c907640c1e39ac328\">In 2007, a Gallup poll estimated that 45% of U.S. adults rated their financial situation as \u201cgood.\u201d We want to know if the proportion is smaller this year. We gather a random sample of 100 U.S. adults this year and find that 39 rate their financial situation as \u201cgood.\u201d Use the output from Minitab to complete the following statements about the p-value. Use numbers from the output to fill in the blanks.<\/p>\r\n<p id=\"ddff7421df64414f91af0b6a4a534feb\"><span class=\"imagewrap\"><span class=\"image\"><img id=\"a4938aa641ad4276a7a22c77b4aeb759\" class=\"img-responsive popimg aligncenter\" title=\"Test and CI for One Proportion. Test of p = 0.45 vs p &amp;lt; 0.45 Sample: 1: X = 39 N = 100 Sample p = 0.390000 95% Upper Bound = 0.485600 Z-Value = -1.21 P-Value = 0.114\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/1_img5.gif\" alt=\"Test and CI for One Proportion. Test of p = 0.45 vs p &amp;lt; 0.45 Sample: 1: X = 39 N = 100 Sample p = 0.390000 95% Upper Bound = 0.485600 Z-Value = -1.21 P-Value = 0.114\" \/><\/span><\/span><\/p>\r\n[h5p id=\"185\"]\r\n\r\nDo zinc supplements reduce a child's risk of catching a cold? A medical study reports a p-value of 0.03. Are the following interpretations of the p-value valid or invalid?\r\n\r\n[h5p id=\"186\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">4. Drawing Conclusions Based on the p-Value<\/span><\/h2>\r\n<p id=\"b64a5eb3c2044b82a168b1a6bea7c14c\">This last part of the four-step process of hypothesis testing is the same across all statistical tests, and actually, we\u2019ve already said basically everything there is to say about it, but it can\u2019t hurt to say it again.<\/p>\r\n<p id=\"ff51cd1b831b4377bf08a6dd285e569b\">The p-value is a measure of how much evidence the data present against H<sub>o<\/sub>. The smaller the p-value, the more evidence the data present against H<sub>o<\/sub>.<\/p>\r\n<p id=\"d635a8139eec43bfa1d2aea354449653\">We already mentioned that what determines what constitutes enough evidence against H<sub>o<\/sub>\u00a0is the\u00a0<em class=\"italic\">significance level<\/em>\u00a0(\u03b1), a cutoff point below which the p-value is considered small enough to reject H<sub>o<\/sub>\u00a0in favor of H<sub>a<\/sub>. The most commonly used significance level is 0.05.<\/p>\r\n<p id=\"bcbc95d524fb4e49ae9c3ccd7ad1c108\">It is important to mention again that this step has essentially two sub-steps:<\/p>\r\n\r\n<ol id=\"bd988bdb3cd947e3a393833b7f9d6f62\">\r\n \t<li>\r\n<p id=\"ee8e6201e8974feea6d78e3fab52d303\">Based on the p-value, determine whether or not the results are significant (i.e., the data present enough evidence to reject H<sub>o<\/sub>).<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af462cb25585a4baf863d81f43044396b\">State your conclusions in the context of the problem.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"de48212f805042b6acfaad6ce8000b10\">Let\u2019s go back to our three examples and draw conclusions.<\/p>\r\n\r\n<div id=\"ae341756210d417bac882287c04898da\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 1<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"ec69ab7db98e4450acf3a303fa02e3bc\">(Has the proportion of defective products been reduced from 0.20 as a result of the repair?)<\/p>\r\n<p id=\"a2756dad13ba4bbcbce3b07560ef653e\">We found that the p-value for this test was 0.023.<\/p>\r\n<p id=\"abc0a8ec01424a1b9b4c089b1a907e08\">Since 0.023 is small (in particular, 0.023 &lt; 0.05), the data provide enough evidence to reject H<sub>o<\/sub>\u00a0and conclude that as a result of the repair the proportion of defective products has been reduced to below 0.20. The following figure is the complete story of this example, and includes all the steps we went through, starting from stating the hypotheses and ending with our conclusions:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"c35ded4b8f5440248f282bc582fb3b0b\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = 0.20 and H_a: p &amp;lt; 0.20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2 and p-value = 0.023. Since the p-value is small we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image275.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = 0.20 and H_a: p &amp;lt; 0.20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2 and p-value = 0.023. Since the p-value is small we conclude that H_0 can be rejected.\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"f89a74d4c3e4453684a436c74253577f\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 2<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<p id=\"f81ca8cc68d74431bfc94e7f3b3ecaa6\">(Is the proportion of students who use marijuana at the college higher than the national proportion, which is 0.157?)<\/p>\r\n<p id=\"ba3e73661fbf4b74a06c8ca1e8ed6090\">We found that the p-value for this test was 0.182.<\/p>\r\n<p id=\"dcdbc9d50b504c1c881a9e41b46e9362\">Since 0.182 is\u00a0<em class=\"italic\">not<\/em>\u00a0small (in particular, 0.182 &gt; 0.05), the data do not provide enough evidence to reject H<sub>o<\/sub>.<\/p>\r\n<p id=\"c435390a011a4739a2702214f79175e7\">We therefore do\u00a0<em class=\"italic\">not<\/em>\u00a0have enough evidence to conclude that the proportion of students at the college who use marijuana is higher than the national figure. Here is the complete story of this example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ca3522fbdd28417d99c4fdd75c02d34e\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = 0.157 and H_a: p &amp;gt; 0.157. We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, z = 0.91, and p-value = 0.182. Since the p-value is too large, we conclude that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image276.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = 0.157 and H_a: p &amp;gt; 0.157. We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, z = 0.91, and p-value = 0.182. Since the p-value is too large, we conclude that H_0 cannot be rejected.\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example 3<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"ecfdfd05b3144451a52d4ea315a7de08\">(Has the proportion of U.S. adults who support the death penalty for convicted murderers changed since 2003, when it was 0.64?)<\/p>\r\n<p id=\"a0c917c58df84dec8ee1eff50fce2cc1\">We found that the p-value for this test was 0.021.<\/p>\r\n<p id=\"b880f01c5252410e971fbcd8c7e4a8ea\">Since 0.021 is small (in particular, 0.021 &lt; 0.05), the data provide enough evidence to reject H<sub>o<\/sub>, and we conclude that the proportion of adults who support the death penalty for convicted murderers has changed since 2003. Here is the complete story of this example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"dc9ecf582d144e098cb40010d5f89159\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = 0.64 and H_a: p \u2260 0.64. We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, z = 2.31, and p-value = 0.021. Because the p-value is small, we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image277.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = 0.64 and H_a: p \u2260 0.64. We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, z = 2.31, and p-value = 0.021. Because the p-value is small, we conclude that H_0 can be rejected.\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"asx\">\r\n<div class=\"activitywrap sectionNest flash\">\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\"><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"e243a016e06549f091b001fe4cf93228\">Two hypothesis tests were conducted.<\/p>\r\n<p id=\"aec5c9ee0f0d4cd18aefced39a34e765\">In test I, a significance level of 0.05 was used, and the p-value was calculated to be 0.025.<\/p>\r\n<p id=\"c2c11c99e783458e826aa39983762583\">In test II, a significance level of 0.01 was used, and the p-value was calculated to be 0.025.<\/p>\r\n[h5p id=\"187\"]\r\n\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<p id=\"d42407903ecf43318cbd764312a599b9\">We have now completed going through the four steps of hypothesis testing, and in particular, we learned how they are applied to the z-test for the population proportion. Let\u2019s briefly summarize:<\/p>\r\n\r\n<div id=\"d55b5437117e43ba852d508db9c42510\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3>Step 1<\/h3>\r\n<p id=\"f2816838a5634b519c932f7a5e3d56bb\">State the null and alternative hypotheses:<\/p>\r\n<p id=\"fb7890857e4a4ed19d56084b76292576\"><em><strong>H<sub>0<\/sub> : p = p<sub>0<\/sub><\/strong><\/em><\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"cb280e84790b4a1db00662fc3a142b62\" class=\"img-responsive popimg aligncenter\" title=\"H_a : p { one of &amp;lt;, &amp;gt;, or \u2260 } p_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image279.gif\" alt=\"H_a : p { one of &amp;lt;, &amp;gt;, or \u2260 } p_0\" \/><\/span><\/span>\r\n<p id=\"bd4a66c940df4841b46a6a145eb5df38\">where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c7d8da069b8a48c3974ec740003586f2\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3>Step 2<\/h3>\r\n<p id=\"f34343af40e44e91974e80bcfeb147c8\">Obtain data from a sample and:<\/p>\r\n<p id=\"dec1adf56aec4ee0ad751ac188e4786f\">(i) Check whether the data satisfy the conditions which allow you to use this test.<\/p>\r\n\r\n<ul id=\"ce72d82156174778a7e8a5ec9dc1f9df\">\r\n \t<li>\r\n<p id=\"ac78a74eea82c4a7ca88f27a3a2b2b054\">Random sample (or at least a sample that can be considered random in context)<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b46509965f0a425398caa13f47266770\"><em class=\"italic\">n<\/em>\u00a0\u22c5\u00a0<em class=\"italic\">p<\/em><sub>0<\/sub>\u00a0\u2265 10,\u00a0<em class=\"italic\">n<\/em>\u00a0\u22c5 (1 \u2212\u00a0<em class=\"italic\">p<\/em><sub>0<\/sub>) \u2265 10<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"c9f979c7c26442e2bf03aa2377cfe365\">(ii) Calculate the sample proportion\u00a0[latex]\\hat{p}[\/latex], and summarize the data using the test statistic:<\/p>\r\n[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]\r\n<p id=\"d91c5fff2150425490ecf42b82b26aa4\">(<em class=\"italic\">Recall:<\/em> This standardized test statistic represents how many standard deviations above or below <em><strong>p<\/strong><\/em><sub><em><strong>o<\/strong><\/em>\u00a0 <\/sub>our sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0is. )<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"bac252af0fda4bfdab029458d52af720\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3>Step 3<\/h3>\r\n<p id=\"f817525d217043b895e9a90f569bb09c\">Find the p-value of the test either by using software or by using the test statistic as follows:<\/p>\r\n<p id=\"af4fbc3465f7428797f1ce00afaa2df1\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&lt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2264<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n<p id=\"eec1af4c1f904bf4b86c8b79775d1d32\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&gt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2265<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n<p id=\"f1fc66c0e14f467b9cab5400d261ff10\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2265<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">|<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">|<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c4b9fd05f441429b823a24b5728c06a1\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3>Step 4<\/h3>\r\n<p id=\"cf809820701b4652b37a8c13f389162c\">Reach a conclusion first regarding the significance of the results, and then determine what it means in the context of the problem. Recall that:<\/p>\r\n<p id=\"c3936568042e43148b510f132f9de8dd\">If the p-value is small (in particular, smaller than the significance level, which is usually .05), the results are significant (in the sense that there is a significant difference between what was observed in the sample and what was claimed in H<sub>o<\/sub>), and so we reject H<sub>o<\/sub>. If the p-value is not small, we do not have enough statistical evidence to reject H<sub>o<\/sub>, and so we continue to believe that H<sub>o<\/sub><em class=\"italic\">may<\/em>\u00a0be true. (Remember, in hypothesis testing we never \u201caccept\u201d H<sub>o<\/sub>).<\/p>\r\n\r\n\r\n<hr \/>\r\n\r\n<div id=\"ccf9c0ac251147498f83f2ce4f1e43d2\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">More About Hypothesis Testing<\/span><\/h2>\r\n<p id=\"d1de7d8a6ee54f35b1639026f77778bf\">The issues regarding hypothesis testing that we will discuss are:<\/p>\r\n<p id=\"dae9ef20b54c4221b3fd02ff4387d175\">1. The effect of sample size on hypothesis testing.<\/p>\r\n<p id=\"dcbc1c90bae5491190b2c06be4ac3f16\">2. Statistical significance vs. practical importance. (This will be discussed in the activity following number 1.)<\/p>\r\n<p id=\"d424fa70d9ab4dc18e88a82916584364\">3. One-sided alternative vs. two-sided alternative\u2014understanding what is going on.<\/p>\r\n<p id=\"cfdaaa74ebec4a2c83ce8c1c62afba28\">4. Hypothesis testing and confidence intervals\u2014how are they related?<\/p>\r\n<p id=\"e40d1afc08774b1db1e780568006fd0f\">Let\u2019s start.<\/p>\r\n<p id=\"a6f6b0127f11475ea6666456a5ccce13\"><\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"fb253aa09da44e98b1ba6e7804de87ee\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">1. The Effect of Sample Size on Hypothesis Testing<\/span><\/h2>\r\n<p id=\"c3d4c71067d84b819b7dbb812d65a3cd\">We have already seen the effect that the sample size has on inference, when we discussed point and interval estimation for the population mean (\u03bc) and population proportion (p). Intuitively\u2026<\/p>\r\n<p id=\"eb21ab0a96ef471a85a54840c1cf5ff2\">Larger sample sizes give us more information to pin down the true nature of the population. We can therefore expect the\u00a0<em class=\"italic\">sample<\/em>\u00a0mean and\u00a0<em class=\"italic\">sample<\/em>\u00a0proportion obtained from a larger sample to be closer to the population mean and proportion, respectively. As a result, for the same level of confidence, we can report a smaller margin of error, and get a narrower confidence interval. What we\u2019ve seen, then, is that larger sample size gives a boost to how much we trust our sample results. In hypothesis testing, larger sample sizes have a similar effect. The following two examples will illustrate that a larger sample size provides more convincing evidence, and how the evidence manifests itself in hypothesis testing. Let\u2019s go back to our example 2 (marijuana use at a certain liberal arts college).<\/p>\r\n\r\n<div id=\"c0859fada2474e819f8fc4855c696b48\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2<\/h4>\r\n<div>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d25bf27d5d664d609e4ca7f347d15cba\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, z = .91, and p-value = .182 . Since the p-value is too large we conclude that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image276.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, z = .91, and p-value = .182 . Since the p-value is too large we conclude that H_0 cannot be rejected.\" \/><\/span><\/span>\r\n<p id=\"f0f7566de64f4531835017741d5f9fe6\">The data\u00a0<em class=\"italic\">do not<\/em>\u00a0provide enough evidence that the proportion of marijuana users at the college is higher than the proportion among all U.S. college students, which is .157. So far, nothing new. Let\u2019s make small changes to the problem (and call it example 2*). The changes are highlighted and the problem is followed by a new figure that reflects the changes.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"da5d2f88278f46eeb036ac9c1b10c8bd\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2*<\/h4>\r\n<div>\r\n<p id=\"fa53fc6608964e8c9ed1b4a95eb64e32\">There are rumors that students in a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that\u00a0<em class=\"italic\">in a simple random sample of 400 students from the college, 76 admitted to marijuana use<\/em>. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (reported by the Harvard School of Public Health).<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e2189d25215d416f8ff97c9b17b98931\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image288.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana.\" \/><\/span><\/span>\r\n<p id=\"b596f3851d3b4c9fb69ec3bccc413778\">We now have a larger sample (400 instead of 100), and also we changed the number of marijuana users (76 instead of 19).<\/p>\r\n<p id=\"f405381e926e4a5180579996f478fedd\">Let\u2019s carry out the test in this case.<\/p>\r\n<p id=\"c4599ef6dd7d42609e54aa40e22fedb9\"><em class=\"italic\">I.<\/em>\u00a0The question of interest did not change, so we are testing the same hypotheses:<\/p>\r\n<p id=\"b58804fb8f8b446ebf7210d5a6b5cdf2\">H<sub>o<\/sub>: p = .157<\/p>\r\n<p id=\"b00c07bc7329439abca95c2b6d3999e2\">H<sub>a<\/sub>: p &gt; .157<\/p>\r\n<p id=\"c51d7674471c4672925d0993b071213e\"><em class=\"italic\">II.<\/em>\u00a0We select a random sample of size\u00a0<em class=\"italic\">400<\/em>\u00a0and find that 76 are marijuana users.<\/p>\r\n<p id=\"eaa615f5f17b4d55944815211027368a\">(Note that the data satisfy the conditions that allow us to use this test. Verify this yourself).<\/p>\r\n<p id=\"e014135b7d0241358dc4e9d4525911c1\">Let\u2019s summarize the data:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"eadb30fd3870434fa542bd7d8284aecf\" class=\"img-responsive popimg aligncenter\" title=\"p-hat = 76\/400 = .19\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image289.gif\" alt=\"p-hat = 76\/400 = .19\" \/><\/span><\/span>\r\n<p id=\"e8d8b11c459248e1a1e8fcd7ba22dd4f\">This is the same sample proportion as in the original problem, so it seems that the data give us the same evidence, but when we calculate the test statistic, we see that actually this is not the case:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b9f4178a594648478e03f6737a6c1401\" class=\"img-responsive popimg aligncenter\" title=\"z = (.19 - .157) \/ \u221a[( .157 (1 - .157) )\/400 ] \u2248 1.81\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image290.gif\" alt=\"z = (.19 - .157) \/ \u221a[( .157 (1 - .157) )\/400 ] \u2248 1.81\" \/><\/span><\/span>\r\n<p id=\"ae79ea390c4b4c439015e57910d00de7\">Even though the sample proportion is the same (.19), since here it is based on a larger sample (400 instead of 100), it is 1.81 standard deviations above the null value of .157 (as opposed to .91 standard deviations in the original problem).<\/p>\r\n<p id=\"ff3b5a6ceed042c9a3f37ecdb432dbdb\"><em class=\"italic\">III.<\/em>\u00a0For the p-value, we use statistical software to find p-value = 0.035.<\/p>\r\n<p id=\"b9ed7681b8004c97b21f0169920ab9a8\">The p-value here is .035 (as opposed to .182 in the original problem). In other words, when H<sub>o<\/sub>\u00a0is true (i.e. when p=.157) it is quite unlikely (probability of .035) to get a sample proportion of .19 or higher based on a sample of size 400 (probability .035), and not very unlikely when the sample size is 100 (probability .182).<\/p>\r\n<p id=\"ce9f11eb2d9a4b4b8d89b4835273ca95\"><em class=\"italic\">IV.<\/em><\/p>\r\n<p id=\"cb9bf442c3fd46258ef7995f0fd3139c\">Our results here are significant. In other words, in example 2* the data provide enough evidence to reject H<sub>o<\/sub>\u00a0and conclude that the proportion of marijuana users at the college is higher than among all U.S. students.<\/p>\r\n<p id=\"ec6f32f6743e4553bb8e8f5743f08282\">Let\u2019s summarize with a figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b3bc15591ffe4a668340363a70a425a4\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image291.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0.\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"acb8dbf1861244fa95d7ea48986f7c0b\">What do we learn from these two examples?<\/p>\r\n<p id=\"a06ab0a108c246b0ba7ec441101f51c1\">We see that sample results that are based on a larger sample carry more weight.<\/p>\r\n<p id=\"e2130163634548ba88a5d407e31e1782\">In example 2, we saw that a sample proportion of .19 based on a sample of size of 100 was not enough evidence that the proportion of marijuana users in the college is higher than .157. Recall, from our general overview of hypothesis testing, that this conclusion (not having enough evidence to reject the null hypothesis)\u00a0<em class=\"italic\">doesn\u2019t<\/em>\u00a0mean the null hypothesis is necessarily true (so, we never \u201caccept\u201d the null); it only means that the particular study didn\u2019t yield sufficient evidence to reject the null. It\u00a0<em class=\"italic\">might<\/em>\u00a0be that the sample size was simply too small to detect a statistically significant difference.<\/p>\r\n<p id=\"e1aa355f90434c5aaf01a0f2c49e45ee\">However, in example 2*, we saw that when the sample proportion of .19 is obtained from a sample of size 400, it carries much more weight, and in particular, provides enough evidence that the proportion of marijuana users in the college is higher than .157 (the national figure). In\u00a0<em class=\"italic\">this<\/em>\u00a0case, the sample size of 400\u00a0<em class=\"italic\">was<\/em>\u00a0large enough to detect a statistically significant difference.<\/p>\r\nThe following activity will allow you to practice the ideas and terminology used in hypothesis testing when a result is not statistically significant.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"c0a97d5d9ed64a60a7e9822c824d3ca9\">Suppose that only 40% of the U.S. public supported the general direction of the previous U.S. administration's policies. To gauge whether the nationwide proportion, p, of support for the\u00a0<em class=\"italic\">current<\/em>\u00a0administration is higher than 40%, a major polling organization conducts a random poll to test the hypotheses:<\/p>\r\n<p id=\"e4176a7efc9d49eea3c0a5593674aed0\">H<sub>o<\/sub>: p = .40<\/p>\r\n<p id=\"bc09925178d24ff9bf73dfa848ee62e6\">H<sub>a<\/sub>: p &gt; .40<\/p>\r\n<p id=\"d952849a45a544a5a10fa5951af4f062\">The results are reported to be\u00a0<em class=\"italic\">not statistically significant<\/em>, with a<em class=\"italic\">p-value of .214<\/em>.<\/p>\r\n\r\n<div class=\"asx \">\r\n<div id=\"du4_m2_testprop10_tutor1\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m2_testprop10_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m2_testprop10_tutor1\" class=\"page 271750 2963696 2963697 2963698 2963699\">\r\n<div>\r\n<p id=\"N1006E\">Decide whether each of the following statements is a valid conclusion or an invalid conclusion, based on the study:<\/p>\r\n[h5p id=\"188\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">3. One-Sided Alternative vs. Two-Sided Alternative<\/span><\/h2>\r\n<p id=\"c19504c1e2594a228002d51620c18a45\">Recall that earlier we noticed (only visually) that for a given value of the test statistic z, the p-value of the two-sided test is twice as large as the p-value of the one-sided test. We will now further discuss this issue. In particular, we will use our example 2 (marijuana users at a certain college) to gain better intuition about this fact.<\/p>\r\n<p id=\"a225dc4c5ab84c51a0cb415c95caad64\">For illustration purposes, we are actually going to use example 2* (where out of a\u00a0<em class=\"italic\">sample of size 400<\/em>, 76 were marijuana users). Let\u2019s recall example 2*, but this time give two versions of it; the original version, and a slightly changed version, which we\u2019ll call example 2**. The differences are highlighted.<\/p>\r\n\r\n<div id=\"b3f3ab16abb246ec93a15198f25c896f\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2*<\/h4>\r\n<div>\r\n<p id=\"e796de70099849a98b2f3255f6a9cef2\"><em class=\"italic\">There are rumors that students at a certain liberal arts college are more inclined to use drugs than U.S. college students in general.<\/em>\u00a0Suppose that in a simple random sample of 400 students from the college, 76 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"b1d847d16a934c68b6ba2c87e79636f0\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2**<\/h4>\r\n<div>\r\n<p id=\"fe40fffd0f484d87b39f008189f36d79\"><em class=\"italic\">The dean of students in a certain liberal arts college was interested in whether the proportion of students who use drugs in her college is different than the proportion among U.S. college students in general.<\/em>\u00a0Suppose that in a simple random sample of 400 students from the college, 76 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p)\u00a0<em class=\"italic\">differs<\/em>\u00a0from the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"189\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"b2808b5bfe224adfb30bbe61e7d9b4f7\">Indeed, in example 2* we suspect from the outset (based on the rumors) that the overall proportion (p) of marijuana smokers at the college is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the reported national proportion of .157, and therefore the appropriate alternative is H<sub>o<\/sub>:p&gt;.157. In example 2**, as a result of the change of wording (which eliminated the part about the rumors), we simply wonder if p is\u00a0<em class=\"italic\">different<\/em>\u00a0(in either direction) from the reported national proportion of .157, and therefore the appropriate alternative is the two-sided test:\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>. Would switching to the two-sided alternative have an effect on our results?<\/p>\r\n<p id=\"fd7eb22963a54531a09aa9e79df1e395\">Let\u2019s explore that.<\/p>\r\n\r\n<div id=\"bbb91b99778249fc9ae18a9862057ba8\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2*<\/h4>\r\n<div>\r\n<p id=\"e3cdac4a5b464033a6600ff7de069e38\">We already carried out the test for this example, and the results are summarized in the following figure:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a784b2005bb74647bade44f4d51b99d1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image293.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0\" \/><\/span><\/span>\r\n<p id=\"baca855d77b748478774e464b6ede114\">The following figure reminds you how the p-value was found (using the test statistic):<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e255224896e740739c7d90221cb046d5\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve with z-scores of 0 and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the p-value = .035\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image294.gif\" alt=\"A N(0,1) curve with z-scores of 0 and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the p-value = .035\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"d27a56ac1d1f4a349c27dedf6af21074\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>2**<\/h4>\r\n<div>\r\n<p id=\"b071d595b9244e3a91b1d10cb46dbf27\">I. Here we are testing:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"c00f87e2c3a141eaae01fde6b9c92344\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = 1.57 H_a: p \u2260 1.57\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image295.gif\" alt=\"H_0: p = 1.57 H_a: p \u2260 1.57\" \/><\/span><\/span>\r\n<p id=\"a7537b55cbd548e9aefb667abe45d3e1\">II. Since we have the same data as in example 2* (76 marijuana users out of 400), we have the same sample proportion and the same test statistic:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ef3980b82ac84102830a5a72a8e0ba39\" class=\"img-responsive popimg aligncenter\" title=\"p-hat = .19 z = 1.81\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image296.gif\" alt=\"p-hat = .19 z = 1.81\" \/><\/span><\/span>\r\n<p id=\"befd1bcc71cd4ae2a7b341e815b8b626\">III. Since the calculation of the p-value depends on the type of alternative we have, here is where things start to be different. Statistical software tells us that the p-value for example 2** is 0.070. Here is a figure that reminds us how the p-value was calculated (based on the test statistic):<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e877934274134200889f318c3ff31d32\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve with z-scores of -1.81, 0, and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the .035 . The area to the left of 1.81 is also .035 . The p-value is the sum of these two areas, which is .07\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image297.gif\" alt=\"A N(0,1) curve with z-scores of -1.81, 0, and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the .035 . The area to the left of 1.81 is also .035 . The p-value is the sum of these two areas, which is .07\" \/><\/span><\/span>\r\n<p id=\"a0790a23e2f74835a505e77746e74db8\">IV. If we use the .05 level of significance, the p-value we got is not small enough (.07&gt;.05), and therefore we cannot reject H<sub>o<\/sub>. In other words, the data do not provide enough evidence to conclude that the proportion of marijuana smokers in the college is different from the national proportion (.157).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"f277f12c88294e88a8aa353f0e93aec7\">What happened here?<\/p>\r\n<p id=\"fb9a1341fb744338b0531a28e6dc4f8b\">It should be pretty clear what happened here numerically. The p-value of the one-sided test (example 2*) is .035, suggesting the results are significant at the .05 significant level. However, the p-value of the two sided-test (example 2**) is twice the p-value of the one-sided test, and is therefore 2*.035=.07, suggesting that the results are not significant at the .05 significance level.<\/p>\r\n<p id=\"b53f81289a66457d992ce2104e1150ff\">Here is a more conceptual explanation:<\/p>\r\n<p id=\"ffdcc4ffd2554a84a975106dddd31cdc\">The idea is that in Example 2*, we began our hypothesis test with a piece of information (in the form of a rumor) about unknown population proportion p, which gave us a sort of head-start towards the goal of rejecting the null hypothesis. We foundthat the evidence that the data provided were then enough to cross the finish line and reject H<sub>o<\/sub>. In Example 2**, we had no prior information to go on, and the data alone were not enough evidence to cross the finish line and reject H<sub>o<\/sub>. The following figure illustrates this idea:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fb66dd4559344499bbf52913b48cdd9c\" class=\"img-responsive popimg aligncenter\" title=\"Two &amp;apos;races&amp;apos; which illustrate why in the two-sided example we could not eliminate H_0. In the first race, H_0: p = .157, H_a: p &amp;gt; .157 . This is a one-sided hypothesis, so we get a head start on the race. The data gets us more progress along the race track, enough that we cross the &amp;apos;finish-line&amp;apos; (being less than the significance level of .05), so we have enough evidence to reject H_0. In the two-sided problem where H_0: p = .157, H_a: p \u2260 .157, we do not have a head start, since we are not given the information of which side. So, we only have the data to give us progress on the race, which isn&amp;apos;t enough progress to cross the &amp;apos;finish-line.&amp;apos;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image298.gif\" alt=\"Two &amp;apos;races&amp;apos; which illustrate why in the two-sided example we could not eliminate H_0. In the first race, H_0: p = .157, H_a: p &amp;gt; .157 . This is a one-sided hypothesis, so we get a head start on the race. The data gets us more progress along the race track, enough that we cross the &amp;apos;finish-line&amp;apos; (being less than the significance level of .05), so we have enough evidence to reject H_0. In the two-sided problem where H_0: p = .157, H_a: p \u2260 .157, we do not have a head start, since we are not given the information of which side. So, we only have the data to give us progress on the race, which isn&amp;apos;t enough progress to cross the &amp;apos;finish-line.&amp;apos;\" \/><\/span><\/span>\r\n<p id=\"d1907e9896d44fc7af7b38c94ce1d0d1\">We can summarize and say that in general it is harder to reject H<sub>o<\/sub>\u00a0against a two-sided H<sub>a<\/sub>\u00a0because the p-value is twice as large. Intuitively, a one-sided alternative gives us a head-start, and on top of that we have the evidence provided by the data. When our alternative is the two-sided test, we get no head-start and all we have are the data, and therefore it is harder to cross the finish line and reject H<sub>o<\/sub>.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"b57d95b661f34e4c9c7c9c16dd424e03\">Consider the following two hypothesis testing scenarios for the population proportion (p) and corresponding studies:<\/p>\r\n<p id=\"f900f0fd7a4c4bc9af87711ad03e979c\"><em class=\"italic\">I.<\/em>\u00a0The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. A study was designed in order to examine whether that proportion has changed since.<\/p>\r\n<p id=\"bf94b29bb3534b5894cc6ab035644a79\"><em class=\"italic\">II.<\/em>The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. In light of the increasing problem of spyware, a study was designed in order to examine whether that proportion has increased since.<\/p>\r\n[h5p id=\"190\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">4. Hypothesis Testing and Confidence Intervals<\/span><\/h2>\r\n<p id=\"d01bed76f8df4994a15f3ea2e5d94377\">The last topic we want to discuss is the relationship between hypothesis testing and confidence intervals. Even though the flavor of these two forms of inference is different (confidence intervals estimate a parameter, and hypothesis testing assesses the evidence in the data against one claim and in favor of another), there is a strong link between them.<\/p>\r\n<p id=\"bad6a7f471594153b6a300fa9b77f8a5\">We will explain this link (using the z-test and confidence interval for the population proportion), and then explain how confidence intervals can be used after a test has been carried out.<\/p>\r\n<p id=\"b9078a5b967749b193bc3176a318efde\">Recall that a confidence interval gives us a set of plausible values for the unknown population parameter. We may therefore examine a confidence interval to informally decide if a proposed value of population proportion seems plausible.<\/p>\r\n<p id=\"cad80ff4f4124760b82ccddaa6fca9d2\">For example, if a 95% confidence interval for p, the proportion of all U.S. adults already familiar with Viagra in May 1998, was (.61, .67), then it seems clear that we should be able to reject a claim that only 50% of all U.S. adults were familiar with the drug, since based on the confidence interval, .50 is not one of the plausible values for p.<\/p>\r\n<p id=\"a66e1b1072ad4af3a857fb927321b1d5\">In fact, the information provided by a confidence interval can be formally related to the information provided by a hypothesis test. (<em class=\"italic\">Comment:<\/em>\u00a0The relationship is more straightforward for two-sided alternatives, and so we will not present results for the one-sided cases.)<\/p>\r\n<p id=\"c3151cd7fb0d4465af22bb8cb269b6a3\">Suppose we want to carry out the\u00a0<em class=\"italic\">two-sided test:<\/em><\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"db49bb1a74f345c4b66d3951487011be\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = p_0 and H_a: p \u2260 p_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image299.gif\" alt=\"H_0: p = p_0 and H_a: p \u2260 p_0\" \/><\/span><\/span>\r\n<p id=\"e09dfdc75c914fa79cdaf545b5c925fd\">using a significance level of .05.<\/p>\r\n<p id=\"cd0dc96c66ff41f58ed171a01a8c8a17\">An alternative way to perform this test is to find a 95%\u00a0<em class=\"italic\">confidence interval<\/em>\u00a0for p and check:<\/p>\r\n<p id=\"a4635f75a8cd469d8e993d99225a44d5\">If <em>p<sub>0<\/sub> <\/em>falls <em class=\"italic\">outside<\/em>\u00a0the confidence interval,\u00a0<em class=\"italic\">reject <\/em>H<sub>o<\/sub>.<\/p>\r\n<p id=\"ce19de1cc30049769f91cd772b6e353e\">If <em>p<sub>0<\/sub> <\/em>falls\u00a0<em class=\"italic\">inside<\/em>\u00a0the confidence interval,\u00a0<em class=\"italic\">do not reject <\/em>H<sub>o<\/sub>.<\/p>\r\n<p id=\"a42d42ad1cc745f387fe9caf98aa509c\">In other words, if <em>p<sub>0<\/sub><\/em>\u00a0is not one of the plausible values for p, we reject H<sub>o<\/sub>.<\/p>\r\n<p id=\"e27bb1bfd4674cd08e06b20fb9bd4078\">If <em>p<sub>0 <\/sub><\/em>is a plausible value for p, we cannot reject H<sub>o<\/sub>.<\/p>\r\n<p id=\"fa67ec65a72c42429ba33f88bfdbb3eb\">(<em class=\"italic\">Comment:<\/em>\u00a0Similarly, the results of a test using a significance level of .01 can be related to the 99% confidence interval.)<\/p>\r\n<p id=\"de79fd1d4c95416b880990511597daeb\">Let\u2019s look at two examples:<\/p>\r\n\r\n<div id=\"da587467bc314d46b104f61a20099040\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"b376a4b40c8344d680eea4e7316c5e88\">Recall example 3, where we wanted to know whether the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was .64.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a2dfdd3a552341e1b34016379d739813\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we want to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat =675\/1000 = .675.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image223.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we want to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat =675\/1000 = .675.\" \/><\/span><\/span>\r\n<p id=\"beab4529e3d9490c8d19432d6d8f118c\">We are testing:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ceb6f023244949d5881a0933d9335913\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .64 and H_a: p \u2260 .64;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image300.gif\" alt=\"H_0: p = .64 and H_a: p \u2260 .64;\" \/><\/span><\/span>\r\n<p id=\"a2c70266a8794a9db0e9aa235e98c809\">and as the figure reminds us, we took a sample of 1,000 U.S. adults, and the data told us that 675 supported the death penalty for convicted murderers (i.e.\u00a0[latex]\\hat{p}=.675[\/latex]).<\/p>\r\n<p id=\"b4ef6b335e0e42fb9f9bb377efb78c99\">A 95% confidence interval for p, the proportion of\u00a0<em class=\"italic\">all<\/em>\u00a0U.S. adults who support the death penalty, is:<\/p>\r\n[latex].675\\pm2\\sqrt{\\frac{.675(1-.675)}{1000}}\\approx.675\\pm.03=\\left(.645,\\ .705\\right)[\/latex]\r\n<p id=\"f803643a0e904b6cbe32b4485d33c2db\">Since the 95% confidence interval for p does not include .64 as a plausible value for p, we can reject H<sub>o<\/sub>\u00a0and conclude (as we did before) that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b40bab0cc56b4686a9bd633af805b567\" class=\"img-responsive popimg aligncenter\" title=\"A number line illustrating the 95% confidence interval for p. The interval is (.645, .705). In H_0, p = .64, which is outside of this interval, so we can reject H_0: p = .64 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image302.gif\" alt=\"A number line illustrating the 95% confidence interval for p. The interval is (.645, .705). In H_0, p = .64, which is outside of this interval, so we can reject H_0: p = .64 .\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"cc68945d65d14b889df1a8de8c98ebf8\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"b077f9f589ba45a195b1995f61b8f341\">You and your roommate are arguing about whose turn it is to clean the apartment. Your roommate suggests that you settle this by tossing a coin and takes one out of a locked box he has on the shelf. Suspecting that the coin might not be fair, you decide to test it first. You toss the coin 80 times, thinking to yourself that if, indeed, the coin is fair, you should get around 40 heads. Instead you get 48 heads. You are puzzled. You are not sure whether getting 48 heads out of 80 is enough evidence to conclude that the coin is unbalanced, or whether this a result that could have happened just by chance when the coin is fair.<\/p>\r\n<p id=\"b2e973e7b1bf409a88c6fec4a2c512fa\">Statistics can help you answer this question.<\/p>\r\n<p id=\"f4dfe567d3cc45cd883a8026d6b7704c\">Let p be the true proportion (probability) of heads. We want to test whether the coin is fair or not:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ebb9890086ef4dce93b9525aab3ac33f\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .5, H_a: p \u2260 .5\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image303.gif\" alt=\"H_0: p = .5, H_a: p \u2260 .5\" \/><\/span><\/span>\r\n<p id=\"fc375fd9858745afac3f220519fe302a\">The data we have are that out of n=80 tosses, we got 48 heads, or that the sample proportion of heads is:[latex] \\hat{p}=\\frac{48}{80}=.6[\/latex]<\/p>\r\n<p id=\"d642f2d91b53451da8bb13cfb5a4440b\">The 95% confidence interval for p, the true proportion of heads for this coin, is:<\/p>\r\n<p>[latex].6\\pm 2 \\sqrt{\\frac{.6(1-.6)}{80}}\\approx .6\\pm .11=(.49,.71)[\/latex]<\/p>\r\n<p id=\"acb3eab7d2134911b3393824bd800459\">Since in this case .5 is one of the plausible values for p, we cannot reject H<sub>o<\/sub>. In other words, the data do not provide enough evidence to conclude that the coin is not fair.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b4608b71275c4648a02059ffb9876b9a\" class=\"img-responsive popimg aligncenter\" title=\"A number line showing the 95% confidence interval for p, which is (.49, .71). H_0 is p = .5, which falls within this interval, so we cannot reject H_0: p = .5 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image306.gif\" alt=\"A number line showing the 95% confidence interval for p, which is (.49, .71). H_0 is p = .5, which falls within this interval, so we cannot reject H_0: p = .5 .\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"ed683b7754ba4a07a00abffab8e38ad5\">The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. A study was designed in order to examine whether that proportion has changed since. Let p be the proportion of all Internet users who are concerned about credit card fraud. In this study we are therefore testing:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"df8d7115bdee4053a5086811a084a665\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .087, H_a: p \u2260 .087\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image390.gif\" alt=\"H_0: p = .087, H_a: p \u2260 .087\" \/><\/span><\/span>\r\n<p id=\"ce85aeacfc5b4d6b8ae1cbe9c346b85f\">Based on the collected data, a 95% confidence interval for p was found to be (.08, .14).<\/p>\r\n[h5p id=\"191\"]\r\n<p id=\"ec5707347543451fad6cb073a8d7b8fe\">The UCLA Internet Report (February 2003) estimated that roughly 60.5% of U.S. adults use the Internet at work for personal use. A follow-up study was conducted in order to explore whether that figure has changed since. Let p be the proportion of U.S. adults who use the Internet at work for personal use.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"f52bc4a09cc04c4d81529cfc02542a66\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .605, H_a: p \u2260 .605\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image391.gif\" alt=\"H_0: p = .605, H_a: p \u2260 .605\" \/><\/span><\/span>\r\n<p id=\"f6e68aaa4e2c4afe854b71d933f362d0\">Based on the collected data, the p-value of the test was found to be .001.<\/p>\r\n[h5p id=\"192\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"c7d73f362d7f4e9a8f6547e0733a6c2f\">The context of the last example is a good opportunity to bring up an important point that was discussed earlier.<\/p>\r\n<p id=\"f813b75445614a33a22835bf7d8ade54\">Even though we use .05 as a cutoff to guide our decision about whether the results are significant, we should not treat it as inviolable and we should always add our own judgment. Let\u2019s look at the last example again.<\/p>\r\n<p id=\"f1b85abb478b49b488c8bb38ef67d3cb\">It turns out that the p-value of this test is .0734. In other words, it is maybe not extremely unlikely, but it is quite unlikely (probability of .0734) that when you toss a\u00a0<em class=\"italic\">fair<\/em>\u00a0coin 80 times you\u2019ll get a sample proportion of heads of 48\/80=.6 (or even more extreme). It is true that using the .05 significance level (cutoff), .0734 is not considered small enough to conclude that the coin is not fair. However, if you really don\u2019t want to clean the apartment, the p-value might be small enough for you to ask your roommate to use a different coin, or to provide one yourself!<\/p>\r\n<p id=\"N10B01\">Here is our final point on this subject:<\/p>\r\n<p id=\"N10B04\">When the data provide enough evidence to reject H<sub>o<\/sub>, we can conclude (depending on the alternative hypothesis) that the population proportion is either less than, greater than or not equal to the null value\u00a0[latex]p_{0}[\/latex]. However, we do not get a more informative statement about its actual value. It might be of interest, then, to follow the test with a 95% confidence interval that will give us more insight into the actual value of p.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B1C\">In our example 3,<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, z = 2.31, and the p-value is .021 , which is small enough to let us reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image277.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, z = 2.31, and the p-value is .021 , which is small enough to let us reject H_0.\" \/><\/span><\/span>\r\n<p id=\"N10B25\">we concluded that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was .64. It is probably of interest not only to know that the proportion has changed, but also to estimate what it has changed to. We\u2019ve calculated the 95% confidence interval for p on the previous page and found that it is (.645, .705).<\/p>\r\n<p id=\"N10B28\">We can combine our conclusions from the test and the confidence interval and say:<\/p>\r\n<p id=\"N10B2B\">Data provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, and we are 95% confident that it is now between .645 and .705. (i.e. between 64.5% and 70.5%).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\"><\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10B31\">Let\u2019s look at our example 1 to see how a confidence interval following a test might be insightful in a different way.<\/p>\r\n<p id=\"N10B34\">Here is a summary of example 1:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2 and p-value = .023 . Since the p-value is small we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image275.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2 and p-value = .023 . Since the p-value is small we conclude that H_0 can be rejected.\" \/><\/span><\/span>\r\n<p id=\"N10B3D\">We conclude that as a result of the repair, the proportion of defective products has been reduced to below .20 (which was the proportion prior to the repair). It is probably of great interest to the company not only to know that the proportion of defective has been reduced, but also estimate what it has been reduced to, to get a better sense of how effective the repair was. A 95% confidence interval for p in this case is:<\/p>\r\n[latex].16\\pm2\\sqrt{\\frac{.16(1-.16)}{400}}\\approx.16\\pm.037=\\left(.129,\\ .197\\right)[\/latex]\r\n<p id=\"N10BB7\">We can therefore say that the data provide evidence that the proportion of defective products has been reduced, and we are 95% sure that it has been reduced to somewhere between 12.9% and 19.7%. This is very useful information, since it tells us that even though the results were significant (i.e., the repair reduced the number of defective products), the repair might not have been effective enough, if it managed to reduce the number of defective products only to the range provided by the confidence interval. This, of course, ties back in to the idea of statistical significance vs. practical importance that we discussed earlier. Even though the results are significant (H<sub>o<\/sub>\u00a0was rejected), practically speaking, the repair might be considered ineffective.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Let\u2019s summarize<\/span><\/h2>\r\n<p id=\"N10BC9\">Even though this unit is about the z-test for population proportion, it is loaded with very important ideas that apply to hypothesis testing in general. We\u2019ve already summarized the details that are specific to the z-test for proportions, so the purpose of this summary is to highlight the general ideas.<\/p>\r\n<p id=\"N10BCC\">The process of hypothesis testing has four steps:<\/p>\r\n<p id=\"N10BCF\"><em>I. Stating the null and alternative hypotheses (H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>).<\/em><\/p>\r\n<p id=\"N10BDB\"><em>II.<\/em>\u00a0Obtaining a random sample (or at least one that can be considered random) and collecting data. Using the data:<\/p>\r\n<p id=\"N10BE1\">*<em>\u00a0Check that the conditions<\/em>\u00a0under which the test can be reliably used are met.<\/p>\r\n<p id=\"N10BE7\">*\u00a0<em>Summarize the data using a test statistic.<\/em><\/p>\r\n<p id=\"N10BED\">The test statistic is a measure of the evidence in the data against H<sub>o<\/sub>. The larger the test statistic is in magnitude, the more evidence the data present against H<sub>o<\/sub>.<\/p>\r\n<p id=\"N10BF6\"><em>III. Finding the p-value of the test.<\/em><\/p>\r\n<p id=\"N10BFC\">The p-value is the probability of getting data like those observed (or even more extreme) assuming that the null hypothesis is true, and is calculated using the null distribution of the test statistic. The p-value is a measure of the evidence against H<sub>o<\/sub>. The smaller the p-value, the more evidence the data present against H<sub>a<\/sub>.<\/p>\r\n<p id=\"N10C05\"><em>IV. Making conclusions.<\/em><\/p>\r\n<p id=\"N10C0B\">\u2013 Conclusions about the\u00a0<em>significance of the results:<\/em><\/p>\r\n<p id=\"N10C11\">If the p-value is small, the data present enough evidence to reject H<sub>o<\/sub>\u00a0(and accept H<sub>a<\/sub>).<\/p>\r\n<p id=\"N10C1A\">If the p-value is not small, the data do not provide enough evidence to reject H<sub>o<\/sub>.<\/p>\r\n<p id=\"N10C20\">To help guide our decision, we use the significance level as a cutoff for what is considered a small p-value. The significance cutoff is usually set at .05, but should not be considered inviolable.<\/p>\r\n<p id=\"N10C23\">\u2013 Conclusions\u00a0<em>in the context<\/em>\u00a0of the problem.<\/p>\r\n<p id=\"N10C29\">Results that are based on a larger sample carry more weight, and therefore\u00a0<em>as the sample size increases, results become more significant.<\/em><\/p>\r\n<p id=\"N10C2F\">Even a very small and practically unimportant effect becomes statistically significant with a large enough sample size. The\u00a0<em>distinction between statistical significance and practical importance<\/em>\u00a0should therefore always be considered.<\/p>\r\n<p id=\"N10C35\">For given data, the\u00a0<em>p-value of the two-sided test is always twice as large as the p-value of the one-sided test<\/em>. It is therefore harder to reject H<sub>o<\/sub>\u00a0in the two-sided case than it is in the one-sided case in the sense that stronger evidence is required. Intuitively, the hunch or information that leads us to use the one-sided test can be regarded as a head-start toward the goal of rejecting H<sub>o<\/sub>.<\/p>\r\n<p id=\"N10C41\"><em>Confidence intervals can be used in order to carry out two-sided tests<\/em>\u00a0(at the .05 significance level). If the null value is not included in the confidence interval (i.e., is not one of the plausible values for the parameter), we have enough evidence to reject H<sub>o<\/sub>. Otherwise, we cannot reject H<sub>o<\/sub>.<\/p>\r\n<p id=\"N10C4C\">If the results are significant, it might be of interest to\u00a0<em>follow up the tests with a confidence interval<\/em> in order to get insight into the actual value of the parameter of interest.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"lobjh\" class=\"\">\n<div class=\"textbox textbox--learning-objectives\">\n<header class=\"textbox__header\">\n<h2 class=\"textbox__title\">Learning Objectives<\/h2>\n<\/header>\n<div class=\"textbox__content\">\n<ul>\n<li id=\"specify_hypotheses\">In a given context, specify the null and alternative hypotheses for the population proportion and mean.<\/li>\n<li id=\"carry_out_hypothesis_testing\">Carry out hypothesis testing for the population proportion and mean (when appropriate), and draw conclusions in context.<\/li>\n<li id=\"apply_concepts\">Apply the concepts of: sample size, statistical significance vs. practical importance, and the relationship between hypothesis testing and confidence intervals.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"a3c0a6255aaa4086beefe1d80d59db02\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Overview<\/span><\/h2>\n<p id=\"e76b43d3b24e4947b36877ae688e7bf9\">Now that we understand the process we go through in hypothesis testing and the logic behind it, we are ready to start learning about specific statistical tests (also known as significance tests).<\/p>\n<p id=\"ebc6ca81c302463eb6c9bb014971df57\">The first test we are going to learn is the test about the population proportion (p). This is test is widely known as the\u00a0<em class=\"italic\">z-test for the population proportion (p).<\/em>\u00a0(We will understand later where the \u201cz-test\u201d part comes from.)<\/p>\n<p id=\"eaffff59cb6e4825a4c13e3e6b65f0b4\">When we conduct a test about a population proportion, we are working with a categorical variable. Later in the course, after we have learned a variety of hypothesis tests, we will need to be able to identify which test is appropriate for which situation. Identifying the variable as categorical or quantitative is an important component of choosing an appropriate hypothesis test.<\/p>\n<div class=\"asx\">\n<div id=\"du4_m2_testprop1_tutor1\" class=\"activitywrap purpose learnbydoing flash\">\n<div class=\"activityhead\">\n<div class=\"purposeType purposelearnbydoing\" title=\"\"><span class=\"scnReader\">learn by doing<\/span><\/div>\n<\/div>\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop1_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop1_tutor1\" class=\"page 271710 2962796 2962797 2962798 2962799\">\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"activity flash\">\n<div class=\"flash_obj asx testFlash mark_flash\">\n<div class=\"page 271710 2962796 2962797 2962798 2962799\">\n<div><\/div>\n<div id=\"2962798\" class=\"question ddfb\">\n<div>\n<p id=\"N10090\">\n<div id=\"h5p-161\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-161\" class=\"h5p-iframe\" data-content-id=\"161\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"bb62411327c348b0b2e4d9fff00047f6\">Our discussion of hypothesis testing for the population proportion p follows the four steps of hypotheses testing that we introduced in our general discussion on hypothesis testing, but this time we go into more details. More specifically, we learn how the test statistic and p-value are calculated and interpreted.<\/p>\n<p id=\"e7d8247936434ba4842f20b9bc23faf9\">Once we learn how to carry out the test for the population proportion p, we discuss some general topics that are related to hypotheses testing. More specifically, we see what role the sample size plays and understand how hypothesis testing and interval estimation (confidence intervals) are related.<\/p>\n<p id=\"a00493797c6c48c089285e46c2893c4e\">Let\u2019s start by introducing the three examples, which will be the leading examples in our discussion. Each example is followed by a figure illustrating the information provided, as well as the question of interest.<\/p>\n<\/div>\n<\/div>\n<div id=\"fb54f09b895242408f0dda3a5d7f87bb\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"dbd0f3a5a2714b26b474a1a602817bac\">A machine is known to produce 20% defective products, and is therefore sent for repair. After the machine is repaired, 400 products produced by the machine are chosen at random and 64 of them are found to be defective. Do the data provide enough evidence that the proportion of defective products produced by the machine (p) has been\u00a0<em class=\"italic\">reduced<\/em>\u00a0as a result of the repair?<\/p>\n<p id=\"df6fa0d7c2704114a0c8d8a663986876\">The following figure displays the information, as well as the question of interest:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"afee653830e14a3ca4e2825ecef97bd1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image212.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective.\" \/><\/span><\/span><\/p>\n<p id=\"a0b0e5c58e724149a8ba352b2b6d6a27\">The question of interest helps us formulate the null and alternative hypotheses in terms of p, the proportion of defective products produced by the machine following the repair:<\/p>\n<p id=\"c832df819f7d4eb8b63136557d0bca07\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\n<p id=\"e82d35e3d6ba49ae89504dd54a189c2a\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"a426f1c3ebfd4078981577a8cdd3ece3\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"bf2f9e1d23564014bf8143bdf8cd591b\">There are rumors that students at a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that in a simple random sample of 100 students from the college, 19 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\n<p id=\"fbf6710943db465e934ef1ef790df49b\">Again, the following figure displays the information as well as the question of interest:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"df34bf92bc44434cac8551a0b7fa1816\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image213.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana.\" \/><\/span><\/span><\/p>\n<p id=\"fe08ff72850e47de8d19a10a7e775083\">As before, we can formulate the null and alternative hypotheses in terms of p, the proportion of students in the college who use marijuana:<\/p>\n<p id=\"ff7aaca85e87423ba460a3223e3998fb\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (same as among all college students in the country).<\/p>\n<p id=\"a3333c60cb12413e945769d30503afac\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (higher than the national figure).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"b9265c113f964dcdaf4d9b79dd242682\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"e70b10b4fb554b97935df11e0a2503c2\">Polls on certain topics are conducted routinely in order to monitor changes in the public\u2019s opinions over time. One such topic is the death penalty. In 2003 a poll estimated that 64% of U.S. adults support the death penalty for a person convicted of murder. In a more recent poll, 675 out of 1,000 U.S. adults chosen at random were in favor of the death penalty for convicted murderers. Do the results of this poll provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers (p) <em class=\"italic\">changed <\/em>between 2003 and the later poll?<\/p>\n<p id=\"d3dc7a96486d4953a5d5e5431f533071\">Here is a figure that displays the information, as well as the question of interest:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b15bcf532b0145cbbb8cf4f2228f4d70\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image214.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor.\" \/><\/span><\/span><\/p>\n<p id=\"d63555cf2b914c3da96b849aaebd11a9\">Again, we can formulate the null and alternative hypotheses in term of p, the proportion of U.S. adults who support the death penalty for convicted murderers.<\/p>\n<p id=\"af6d8eac990e41259f67b9117b1c8e9d\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\n<p id=\"c158fa05ebca488db1915f422cd61715\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>According to the American Association of Community Colleges, 23% of community college students receive federal grants. The California Community College Chancellor\u2019s Office anticipates that the percentage is smaller for California community college students. They collect a sample of 1,000 community college students in California and find that 210 received federal grants.<\/p>\n<div id=\"h5p-162\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-162\" class=\"h5p-iframe\" data-content-id=\"162\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>Using data from 2008, the American Association of Community Colleges (AACC) reports that community college students constitute 46% of all U.S. undergraduates. Given the downturn in the U.S. economy, the AACC anticipates an increase in this percentage for 2010. A poll of 500 randomly chosen undergraduates taken in 2010 indicates that 52% are attending a community college.<\/p>\n<div id=\"h5p-163\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-163\" class=\"h5p-iframe\" data-content-id=\"163\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"b88721ac15ab446bbf37500bff8394bf\">Recall that there are basically 4 steps in the process of hypothesis testing:<\/p>\n<p id=\"d239382347e845389ba491a7ec80aa9f\">1. State the null and alternative hypotheses.<\/p>\n<p id=\"d0aa1929bd8244fe99ffda6559663c9a\">2. Collect relevant data from a random sample and summarize them (using a test statistic).<\/p>\n<p id=\"bc5c8d3b03464a089c091575d8306317\">3. Find the p-value, the probability of observing data like those observed assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<p id=\"fe7d1129b0b0463fbf2acb03edb5564c\">4. Based on the p-value, decide whether we have enough evidence to reject H<sub>o<\/sub>\u00a0(and accept H<sub>a<\/sub>), and draw our conclusions in context.<\/p>\n<p id=\"ebad0df85d604d10984c71888e87427b\">We are now going to go through these steps as they apply to the hypothesis testing for the population proportion p. It should be noted that even though the details will be specific to this particular test, some of the ideas that we will add apply to hypothesis testing in general.<\/p>\n<div id=\"c825f2c0acc64131b1b4816ca0288d1e\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">1. Stating the Hypotheses<\/span><\/h2>\n<p id=\"c3dd31be633149f08e5b33574fd00035\">Here again are the three set of hypotheses that are being tested in each of our three examples:<\/p>\n<div id=\"fbf3d95b57cd45d0b74254247d1b5824\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"be9358131b9445c49c06b43d6a1b4543\">Has the proportion of defective products been reduced as a result of the repair?<\/p>\n<p id=\"d9d5e0fc474547bea7491be03d78aeac\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\n<p id=\"ecb5fa65497840f3a84597d08186be9b\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"b4fc1a1179fe4e02a2aac9f4284062b5\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"c7e1c9c4c72a4e238320bea0d984a607\">Is the proportion of marijuana users in the college higher than the national figure?<\/p>\n<p id=\"e21353fdbec74a118a3ea61c2d510e67\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (Same as among all college students in the country).<\/p>\n<p id=\"fd7e524dba2149c7bb4a72774fc1f117\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (Higher than the national figure).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"cc29a6e7218e401fa61b7e97114626f7\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"e11c01be33cc482192cb32e6edca2cbc\">Did the proportion of U.S. adults who support the death penalty changebetween 2003 and a later poll?<\/p>\n<p id=\"ae136a2a0dbb477bbbf6a6fc8bed05a1\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\n<p id=\"fd6900d28a564ff6bfa28000c381bb14\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"ec7de58bbd6b4f8b9e42fc9d9e17591f\">Note that the null hypothesis always takes the form:<\/p>\n<p id=\"aec73d9a03cd40658caaf90d500cb68b\">H<sub>o<\/sub>: p = some value<\/p>\n<p id=\"ea20157d02ce4b4bad892cc8a5cde326\">and the alternative hypothesis takes one of the following three forms:<\/p>\n<p id=\"c8d10433033e4243a57c2e994e18c42a\">H<sub>a<\/sub>: p &lt; that value (like in example 1)\u00a0<em class=\"italic\">or<\/em><\/p>\n<p id=\"f16e5cc3e8d4454d856f4bd28cccbce4\">H<sub>a<\/sub>: p &gt; that value (like in example 2)\u00a0<em class=\"italic\">or<\/em><\/p>\n<p id=\"e909cc7daa6240de9003c269dbcfbeca\">H<sub>a<\/sub>: p \u2260 that value (like in example 3).<\/p>\n<p id=\"a6c80a97c1a74ebdb9a28fea8133cbd3\">Note that it was quite clear from the context which form of the alternative hypothesis would be appropriate. The value that is specified in the null hypothesis is called the\u00a0<em class=\"italic\">null value<\/em>, and is generally denoted by p<sub>o<\/sub>. We can say, therefore, that in general the null hypothesis about the population proportion (p) would take the form:<\/p>\n<p id=\"fa509c65b5d84c478b1a10f7efb84711\">H<sub>o<\/sub>: p = p<sub>o<\/sub><\/p>\n<p id=\"a1ee5e6b2a5c494ca191f4b2cf7e2138\">We write H<sub>o<\/sub>: p = p<sub>o<\/sub>\u00a0to say that we are making the hypothesis that the population proportion has the value of p<sub>o<\/sub>. In other words, p is the unknown population proportion and p<sub>o<\/sub>\u00a0is the number we think p might be for the given situation.<\/p>\n<p id=\"aaad80f916a14128a7982a540607935f\">The alternative hypothesis takes one of the following three forms (depending on the context):<\/p>\n<p id=\"e3f35d1a20964cca89e4fd1773c65686\">H<sub>a<\/sub>: p &lt; p<sub>o<\/sub><em class=\"italic\">(one-sided)<\/em><\/p>\n<p id=\"bde1c812903142ed874ea3a5e52f660d\">H<sub>a<\/sub>: p &gt; p<sub>o<\/sub><em class=\"italic\">(one-sided)<\/em><\/p>\n<p id=\"b24ddfd044204a02ad9e36c3d13654f9\">H<sub>a<\/sub>: p \u2260 p<sub>o<\/sub><em class=\"italic\">(two-sided)<\/em><\/p>\n<p id=\"def808dd167840b391efe741051b1ef8\">The first two possible forms of the alternatives (where the = sign in H<sub>o<\/sub>\u00a0is challenged by &lt; or &gt;) are called\u00a0<em class=\"italic\">one-sided alternatives<\/em>, and the third form of alternative (where the = sign in H<sub>o<\/sub>\u00a0is challenged by \u2260) is called a<em class=\"italic\">two-sided alternative.<\/em>\u00a0To understand the intuition behind these names let\u2019s go back to our examples.<\/p>\n<p id=\"c4167172cdb546ca86bc431ef41ec41b\">Example 3 (death penalty) is a case where we have a two-sided alternative:<\/p>\n<p id=\"a5a9dc1974c24055b7a52d3e7235fb31\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p =.64 (No change from 2003).<\/p>\n<p id=\"e3f430512abb4171a72975f8ed93eb1a\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p \u2260.64 (Some change since 2003).<\/p>\n<p id=\"e986b2c250714e16ba232d9be949218b\">In this case, in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>\u00a0we will need to get a sample proportion of death penalty supporters which is very different from .64\u00a0<em class=\"italic\">in either direction,<\/em>\u00a0either much larger or much smaller than .64.<\/p>\n<p id=\"e67f5e2294f14c13bdad42972b2fab41\">In example 2 (marijuana use) we have a one-sided alternative:<\/p>\n<p id=\"aa82464505bf45f8b875e8a1882cbc50\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .157 (Same as among all college students in the country).<\/p>\n<p id=\"a52c39ddf41941b686bbe00605407041\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &gt; .157 (Higher than the national figure).<\/p>\n<p id=\"ca5871f8af784ef5b6b8e9d80e5ce51c\">Here, in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>\u00a0we will need to get a sample proportion of marijuana users which is much\u00a0<em class=\"italic\">higher<\/em>\u00a0than .157.<\/p>\n<p id=\"d1040674318448c3ba3c9f72393b4a66\">Similarly, in example 1 (defective products), where we are testing:<\/p>\n<p id=\"ed77295908ea46c4a167b56c10286b0f\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p = .20 (No change; the repair did not help).<\/p>\n<p id=\"bf52650efd4d486cb16eb307bd4c3caf\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0p &lt; .20 (The repair was effective).<\/p>\n<p id=\"fd5e6e882cbd4e8f880a853df0dd958f\">in order to reject H<sub>o<\/sub>\u00a0and accept H<sub>a<\/sub>, we will need to get a sample proportion of defective products which is much\u00a0<em class=\"italic\">smaller<\/em>\u00a0than .20.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"da1f1d09db494d5f8d62557911a36238\">In each of the following examples, a test for the population proportion (p) is called for. You are asked to select the right null and alternative hypotheses.<\/p>\n<p id=\"e7d5064fc10148c18f2ff168751cd53f\"><em class=\"italic\">Scenario 1:\u00a0<\/em>The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. Has that figure changed since? To test this, a random sample of 100 Internet users was chosen, and when interviewed, 10 said that they were extremely worried about credit card fraud when buying online. Let p be the proportion of all Internet users who are concerned about credit card fraud.<\/p>\n<div id=\"h5p-164\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-164\" class=\"h5p-iframe\" data-content-id=\"164\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 3\"><\/iframe><\/div>\n<\/div>\n<p><em class=\"italic\">Scenario 2:\u00a0<\/em>The UCLA Internet Report (February 2003) estimated that a proportion of roughly .75 of online homes are still using dial-up access, but claimed that the use of dial-up is declining. Is that really the case? To examine this, a follow-up study was conducted a year later in which out of a random sample of 1,308 households that had Internet access, 804 were connecting using a dial-up modem. Let p be the proportion of all U.S. Internet-using households that have dial-up access.<\/p>\n<div id=\"h5p-165\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-165\" class=\"h5p-iframe\" data-content-id=\"165\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 4\"><\/iframe><\/div>\n<\/div>\n<p><em class=\"italic\">Scenario 3:\u00a0<\/em>According to the UCLA Internet Report (February 2003) the use of the Internet at home is growing steadily and it is estimated that roughly 59.3% of households in the United States have Internet access at home. Has that trend continued since the report was released? To study this, a random sample of 1,200 households from a big metropolitan area was chosen for a more recent study, and it was found that 972 had an Internet connection. Let p be the proportion of U.S. households that have internet access.<\/p>\n<div id=\"h5p-166\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-166\" class=\"h5p-iframe\" data-content-id=\"166\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 5\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"ee1138e762374c23b62954f9649fa8f6\">In each of the following examples, a test for the population proportion (p) is called for. You are asked to select the right null and alternative hypotheses.<\/p>\n<p id=\"b88cca7da3174f5fb92196603651922c\"><em class=\"italic\">Scenario 1:<\/em>\u00a0When shirts are made, there can occasionally be defects (such as improper stitching). But too many such defective shirts can be a sign of substandard manufacturing.<\/p>\n<p id=\"eba4fc456d9c43ccb9a9fbee2338d257\">Suppose, in the past, your favorite department store has had only one defective shirt per 200 shirts (a prior defective rate of only .005). But you suspect that the store has recently switched to a substandard manufacturer. So you decide to test to see if their overall proportion of defective shirts today is higher.<\/p>\n<p id=\"f8fa317ba4334f25829c1ff80dc0563d\">Suppose that, in a random sample of 200 shirts from the store, you find that 27 of them are defective, for a sample proportion of defective shirts of .135. You want to test whether this is evidence that the store is &#8220;guilty&#8221; of substandard manufacturing, compared to their prior rate of defective shirts.<\/p>\n<div id=\"h5p-167\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-167\" class=\"h5p-iframe\" data-content-id=\"167\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 2\"><\/iframe><\/div>\n<\/div>\n<p id=\"d9c5388913c5427b95c1e3db243d178f\"><em class=\"italic\">Scenario 2:<\/em>\u00a0It is a known medical fact that just slightly fewer females than males are born (although the reasons are not completely understood); the known &#8220;proper&#8221; baseline female birthrate is about 49% females.<\/p>\n<p id=\"ead7f3b3339f4d458acfd785df6438a8\">In some cultures, male children are traditionally looked on more favorably than female children, and there is concern that the increasing availability of ultrasound may lead to pregnant mothers deciding to abort the fetus if it\u2019s not the culturally &#8220;desired&#8221; gender. If this is happening, then the proportion of females in those nations would be significantly lower than the proper baseline rate.<\/p>\n<p id=\"e718a98a9fe24641bc5f223eeaec5329\">To test whether the proportion of females born in India is lower than the proper baseline female birthrate, a study investigates a random sample of 6,500 births from hospital files in India, and finds 44.8% females born among the sample.<\/p>\n<div id=\"h5p-168\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-168\" class=\"h5p-iframe\" data-content-id=\"168\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 3\"><\/iframe><\/div>\n<\/div>\n<p id=\"e3c852487f2f42f29c4f5f2b5a0b74f2\"><em class=\"italic\">Scenario 3:<\/em>\u00a0A properly-balanced 6-sided game die should give a 1 in exactly 1\/6 (16.7%) of all rolls. A casino wants to test its game die. If the die is not properly balanced one way or another, it could give either too many 1\u2019s or too few 1\u2019s, either of which could be bad.<\/p>\n<p id=\"de34e7a6eb3f49c0affc22dbf227b33d\">The casino wants to use the proportion of 1\u2019s to test whether the die is out of balance. So the casino test-rolls the die 60 times and gets a 1 in 9 of the rolls (15%).<\/p>\n<div id=\"h5p-169\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-169\" class=\"h5p-iframe\" data-content-id=\"169\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 4\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">2. Collecting and Summarizing the Data (Using a Test Statistic)<\/span><\/h2>\n<p id=\"adc378129b3b43e9b1a258185671a280\">After the hypotheses have been stated, the next step is to obtain a\u00a0<em class=\"italic\">sample<\/em>\u00a0(on which the inference will be based),\u00a0<em class=\"italic\">collect relevant data<\/em>, and\u00a0<em class=\"italic\">summarize<\/em>\u00a0them.<\/p>\n<p id=\"b19bec55934f4bada7dda9ecfa0f6f2a\">It is extremely important that our sample is representative of the population about which we want to draw conclusions. This is ensured when the sample is chosen at\u00a0<em class=\"italic\">random.<\/em>\u00a0Beyond the practical issue of ensuring representativeness, choosing a random sample has theoretical importance that we will mention later.<\/p>\n<p id=\"caeda4d28b7c413cb57fede3867d0a4e\">In the case of hypothesis testing for the population proportion (p), we will collect data on the relevant categorical variable from the individuals in the sample and start by calculating the sample proportion,\u00a0[latex]\\hat{p}[\/latex]\u00a0(the natural quantity to calculate when the parameter of interest is p).<\/p>\n<p id=\"efab1f8afd8946f18246144980e7cd0b\">Let\u2019s go back to our three examples and add this step to our figures.<\/p>\n<div id=\"cb64c0710e294728960efcbb9f5db270\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d6eee5c7dccd496fa22b11bb430adfa3\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image221.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still .20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16\" \/><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"fca2b1f104444259821e20b81c6c79c9\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cada7271b18a4ea6abafc85c4f8b64c3\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image222.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19\" \/><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"e84c10da140942b297cfeeb579122d16\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d7e870431bc34392b3c9effbe73ebe6f\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image223.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675\" \/><\/span><\/span><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"c48234d8f94b4219bc3a8fccdda9ec06\">As we mentioned earlier without going into details, when we summarize the data in hypothesis testing, we go a step beyond calculating the sample statistic and summarize the data with a\u00a0<em class=\"italic\">test statistic<\/em>. Every test has a test statistic, which to some degree captures the essence of the test. In fact, the p-value, which so far we have looked upon as \u201cthe king\u201d (in the sense that everything is determined by it), is actually determined by (or derived from) the test statistic. We will now gradually introduce the test statistic.<\/p>\n<p id=\"cbe59fac565d4242bfc2e0bb36c2b657\">The test statistic is\u00a0<em class=\"italic\">a measure<\/em>\u00a0of how far the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0is from the null value\u00a0p<sub>0<\/sub>, the value that the null hypothesis claims is the value of p. In other words, since\u00a0[latex]\\hat{p}[\/latex]\u00a0is what the data estimates p to be, the test statistic can be viewed as a measure of the \u201cdistance\u201d between what the data tells us about p and what the null hypothesis claims p to be.<\/p>\n<p id=\"aa17be4cdf5e447fb1b7693c4264f862\">Let\u2019s use our examples to understand this:<\/p>\n<div id=\"d51808214b4840e59d40b7ccd690f880\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"ec6ed2b745e34d5f897b0cad26770e66\">The parameter of interest is p, the proportion of defective products following the repair.<\/p>\n<p id=\"b2b2acbe3ebd4bbeb58c0f701b0f4320\">The data estimate p to be\u00a0[latex]\\hat{p}=.16[\/latex]<\/p>\n<p id=\"af0389fe4da14dab94c6488d22e0ac66\">The null hypothesis claims that p = .20<\/p>\n<p id=\"e43c8b5aad1e47179f58687b7000b2e6\">The data are therefore .04 (or 4 percentage points) below the null hypothesis with respect to what they each tell us about p.<\/p>\n<p id=\"fe02f60b7f1f457da29d64c92e747170\">It is hard to evaluate whether this difference of 4% in defective products is enough evidence to say that the repair was effective, but clearly, the larger the difference, the more evidence it is against the null hypothesis. So if, for example, our sample proportion of defective products had been, say, .10 instead of .16, then I think you would all agree that cutting the proportion of defective products in half (from 20% to 10%) would be extremely strong evidence that the repair was effective.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"e12cfe88fb7c4e88b023fdc3217f0671\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"c13163a9bbff41b0801fb544c6d25450\">The parameter of interest is p, the proportion of students in a college who use marijuana.<\/p>\n<p id=\"aada2df0a98d421badf65c62642c3400\">The data estimate p to be\u00a0[latex]\\hat{p}=.19[\/latex].<\/p>\n<p id=\"ac527ef74ba94e659c609d2a76860825\">The null hypothesis claims that p = .157<\/p>\n<p id=\"df6f8935e8244b18a73b7d0ffa885217\">The data are therefore .033 (or 3.3 percentage points) above the null hypothesis with respect to what they each tell us about p.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"ebd97f453f8f4bdc9764fd9dea274bc8\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"cd3c3e78f8474807ac11dd2bce9872ed\">The parameter of interest is p, the proportion of U.S. adults who support the death penalty for convicted murderers.<\/p>\n<p id=\"cd1921ba82ef4c9bb74052e9f1c50c26\">The data estimate p to be\u00a0[latex]\\hat{p}=.675[\/latex]<\/p>\n<p id=\"cb647ce7f2264fd68df5c21e2a56954a\">The null hypothesis claims that p = .64.<\/p>\n<p id=\"c99c4f21bcec4e689b961c3e3f89ff25\">There is a difference of .035 (3.5 percentage points) between the data and the null hypothesis with respect to what they each tell us about p.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"b2042fbcc29d416cab4e2f35d01035bb\">There is a problem with just looking at the difference between the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0and the null value\u00a0p<sub>0<\/sub>.<\/p>\n<p id=\"d7e7b48c540e46fa981950b02f8502bc\">Examples 2 and 3 illustrate this problem very well.<\/p>\n<p id=\"a746195cf1814d06977f5babc6a1a8d6\">In example 2 we have a difference of 3.3 percentage points between the data and the null hypothesis, which is approximately the same as the difference in example 3 of 3.5 percentage points. However, the difference in example 3 of 3.5 percentage points is based on a\u00a0<em class=\"italic\">sample of size of 1,000<\/em>\u00a0and therefore it is much\u00a0<em class=\"italic\">more impressive<\/em>\u00a0than the difference of 3.3 percentage points in example 2, which was obtained from a sample of size of only 100.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-170\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-170\" class=\"h5p-iframe\" data-content-id=\"170\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 6\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"a0dd297f213748b3b764c2373e4a42aa\">For the reason illustrated in the examples at the end of the previous page, the test statistic cannot simply be the difference\u00a0[latex]\\hat{p}-p_{0}[\/latex], but must be some form of that formula that accounts for the sample size. In other words, we need to somehow standardize the difference\u00a0[latex]\\hat{p}-p_{0}[\/latex]\u00a0so that comparison between different situations will be possible. We are very close to revealing the test statistic, but before we construct it, let\u2019s be reminded of the following two facts from probability:<\/p>\n<p id=\"d9965edfe551454abfbf42220620bc0b\">1. When we take a random sample of size n from a population with population proportion p, the possible values of the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0(when certain conditions are met) have approximately a normal distribution with:<\/p>\n<p id=\"faa922e36b664a22938f86f76d04f504\">* mean: p<\/p>\n<p>standard deviation:\u00a0[latex]\\sqrt{\\frac{\\mathcal{p}\\left(1-\\mathcal{p}\\right)}{\\mathcal{n}}}[\/latex]<\/p>\n<p id=\"eccbc94de1c74c378be8497bd48dc427\">2. The z-score of a normal value (a value that comes from a normal distribution) is:<\/p>\n<p>[latex]\\mathcal{z}=\\frac{value-mean}{standard\\ deviation}[\/latex]<\/p>\n<p id=\"b7bb3568d36c4a11b48ecc58a58ebaf4\">and it represents how many standard deviations below or above the mean the value is.<\/p>\n<p id=\"fec1f4cf967042f18b9441e9e117f2b0\">We are finally ready to reveal the test statistic:<\/p>\n<p id=\"c51f2023a5334cada59f3592f6a7ca50\">The test statistic for this test measures the difference between the sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0\u00a0and the null value\u00a0p<sub>0<\/sub>\u00a0by the z-score (standardized score) of the sample proportion\u00a0[latex]\\hat{p}[\/latex], assuming that the null hypothesis is true (i.e., assuming that\u00a0[latex]p-p_{0}[\/latex]).<\/p>\n<p id=\"a79b6af46d2c479ea4c5494f61e6bf66\">From fact 1, we know that the values of the sample proportion [latex]\\hat{p}[\/latex]\u00a0 are normal, and we are given the mean and standard deviation.<\/p>\n<p id=\"c3c4d57100ab45debadd839fb3e0b348\">Using fact 2, we conclude that the z-score of\u00a0[latex]\\hat{p}[\/latex]\u00a0when\u00a0[latex]\\hat{p}-p_{0}[\/latex]\u00a0is:<\/p>\n<p>[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]<\/p>\n<p id=\"aaad3420d435457aad88e35008c68bcc\"><em class=\"italic\">This is the test statistic.<\/em>\u00a0It represents the difference between the sample proportion ([latex]\\hat{p}[\/latex]) and the null value ([latex]p_{0}[\/latex]), measured in standard deviations.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bbfa1e0dd51a44e9a126a5d37e88694d\" class=\"img-responsive popimg aligncenter\" title=\"A normal curve representing samping distribution of p-hat assuming that p=p_0. Marked on the horizontal axis is p_0 and a particular value of p-hat. z is the difference between p-hat and p_0 measured in standard deviations (with the sign of z indicating whether p-hat is below or above p_0)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image237.gif\" alt=\"A normal curve representing samping distribution of p-hat assuming that p=p_0. Marked on the horizontal axis is p_0 and a particular value of p-hat. z is the difference between p-hat and p_0 measured in standard deviations (with the sign of z indicating whether p-hat is below or above p_0)\" \/><\/span><\/span><\/p>\n<p id=\"c5f1e7a14ea1485ab3b7bf305e8d801c\">Here is a representation of the sampling distribution of\u00a0<span id=\"MathJax-Element-16-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-146\" class=\"mjx-math\"><span id=\"MJXc-Node-147\" class=\"mjx-mrow\"><span id=\"MJXc-Node-148\" class=\"mjx-mrow\"><span id=\"MJXc-Node-149\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-151\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-150\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>, assuming p = p<sub>0<\/sub>. In other words, this is a model of how\u00a0<span id=\"MathJax-Element-17-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-152\" class=\"mjx-math\"><span id=\"MJXc-Node-153\" class=\"mjx-mrow\"><span id=\"MJXc-Node-154\" class=\"mjx-mrow\"><span id=\"MJXc-Node-155\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-157\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-156\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u2018s behave if we are drawing random samples from a population for which H<sub>0<\/sub>\u00a0is true. Notice the center of the sampling distribution is at p<sub>0<\/sub>, which is the hypothesized proportion given in the null hypothesis (H<sub>0<\/sub>: p = p<sub>0<\/sub>.) We could also mark the axis in standard deviation units,\u00a0[latex]\\sqrt{\\frac{\\mathcal{p}\\left(1-\\mathcal{p}\\right)}{\\mathcal{n}}}[\/latex]. For example, if our null hypothesis claims that the proportion of U.S. adults supporting the death penalty is 0.64, then the sampling distribution is drawn as if the null is true. We draw a normal distribution centered at p = 0.64 with a standard deviation dependent on sample size, [latex]\\sqrt{\\frac{0.64(1-0.64)}{\\mathcal{n}}}[\/latex].<\/p>\n<div id=\"bc9eff87f72c41ed979bf99005ac4c76\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Important Comment<\/span><\/h2>\n<p id=\"f0e0381cf06048aeade4d8b4e3091273\">Note that under the assumption that H<sub>0<\/sub>\u00a0is true (i.e.,\u00a0[latex]p=p_{0}[\/latex]), the test statistic, by the nature of the fact that it is a z-score, has N(0,1) (standard normal) distribution. Another way to say the same thing which is quite common is: \u201cThe null distribution of the test statistic is N(0,1).\u201d By \u201cnull distribution,\u201d we mean the distribution under the assumption that H<sub>0<\/sub>\u00a0is true. As we\u2019ll see and stress again later, the null distribution of the test statistic is what the calculation of the p-value is based on.<\/p>\n<\/div>\n<\/div>\n<p id=\"e21a438e7b604cb58cc3db63a9e8bee5\">Let\u2019s go back to our three examples and find the test statistic in each case:<\/p>\n<div id=\"a1898c10c25c4490b1c846fc718cffab\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fe376f2393d1442abae7b9901f2cd2eb\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still 0.20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image238.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The question we wish to answer is &amp;quot;is p still 0.20 or has it been reduced?&amp;quot; We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2.\" \/><\/span><\/span><\/p>\n<p id=\"d5e63f4ec5444a56b5a181f0ce762260\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.20, the standardized score of\u00a0<span id=\"MathJax-Element-21-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-203\" class=\"mjx-math\"><span id=\"MJXc-Node-204\" class=\"mjx-mrow\"><span id=\"MJXc-Node-205\" class=\"mjx-mrow\"><span id=\"MJXc-Node-206\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-208\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-207\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-209\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-210\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-211\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">16<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.16-.20}{\\sqrt{\\frac{.20\\left(1-.20\\right)}{400}}}=-2[\/latex]<\/p>\n<p id=\"cccaf387ae5c4e2798a67813f3bcfa48\">This is the value of the test statistic for this example.<\/p>\n<p id=\"b703630313bb43a2a81e3041240c269a\">What does this tell me?<\/p>\n<p id=\"e0ec45140d2b4a10b1bf0576e2c90341\">This z-score of \u22122 tells me that (assuming that H<sub>0<\/sub>\u00a0is true) the sample proportion\u00a0<span id=\"MathJax-Element-23-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-242\" class=\"mjx-math\"><span id=\"MJXc-Node-243\" class=\"mjx-mrow\"><span id=\"MJXc-Node-244\" class=\"mjx-mrow\"><span id=\"MJXc-Node-245\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-247\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-246\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-248\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-249\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-250\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">16<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is 2 standard deviations below the null value (0.20).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"d362897edcb748a69d7ee921aec0a8eb\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e7458122c2aa458f88586f110282d431\" class=\"img-responsive popimg\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, and z = 0.91\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image241.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The question we wish to answer is &amp;quot;is p .157 (like the national figure) or higher?&amp;quot; We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, and z = 0.91\" \/><\/span><\/span><\/p>\n<p id=\"b40f0100985e44fd9885905b7057b7ac\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.157, the standardized score of\u00a0<span id=\"MathJax-Element-24-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-251\" class=\"mjx-math\"><span id=\"MJXc-Node-252\" class=\"mjx-mrow\"><span id=\"MJXc-Node-253\" class=\"mjx-mrow\"><span id=\"MJXc-Node-254\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-256\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-255\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-257\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-258\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-259\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">19<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.19-.157}{\\sqrt{\\frac{.157\\left(1-.157\\right)}{100}}}\\approx.91[\/latex].<\/p>\n<p id=\"b7f6ea63d42543babbf8609bacd758e0\">This is the value of the test statistic for this example.<\/p>\n<p id=\"f0d424974e994c3cb178ad300ac4010f\">We interpret this to mean that, assuming that H<sub>0<\/sub>\u00a0is true, the sample proportion\u00a0<span id=\"MathJax-Element-26-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-290\" class=\"mjx-math\"><span id=\"MJXc-Node-291\" class=\"mjx-mrow\"><span id=\"MJXc-Node-292\" class=\"mjx-mrow\"><span id=\"MJXc-Node-293\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-295\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-294\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><span id=\"MJXc-Node-296\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-297\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-298\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">19<\/span><\/span><\/span><\/span><\/span><\/span>\u00a0is 0.91 standard deviations above the null value (0.157).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"bb48e79a3d5d4a669663acca7caa56d8\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a2b37da0c9ed42e3b036b15b76e651c7\" class=\"img-responsive popimg\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was 0.64)?&amp;quot; We take a sample of 1000 US adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, and z = 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image244.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we wish to answer is &amp;quot;has p changed since 2003 (when it was 0.64)?&amp;quot; We take a sample of 1000 US adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, and z = 2.31\" \/><\/span><\/span><\/p>\n<p id=\"b9cac5a4a24c4dc8bb353b73ebe8ac86\">Since the null hypothesis is H<sub>0<\/sub>: p = 0.64, the standardized score of\u00a0[latex]\\hat{p}=.675[\/latex]\u00a0is:\u00a0[latex]\\mathcal{z}=\\frac{.675-.64}{\\sqrt{\\frac{.64\\left(1-.64\\right)}{1000}}}\\approx2.31[\/latex].<\/p>\n<p id=\"bc67b26077bd4dc8b099b826961a7b50\">This is the value of the test statistic for this example.<\/p>\n<p id=\"b5f083ee64994674a33ada5ba2ecc05b\">We interpret this to mean that, assuming that H<sub>0<\/sub>\u00a0is true, the sample proportion\u00a0[latex]\\hat{p}=.675[\/latex]\u00a0is 2.31 standard deviations above the null value (0.64).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>We think that the most common color for automobiles is silver and that 24% of all automobiles sold are silver. We take a random sample of 225 cars and find that 63 of them are silver.<\/p>\n<div id=\"h5p-171\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-171\" class=\"h5p-iframe\" data-content-id=\"171\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 7\"><\/iframe><\/div>\n<\/div>\n<p>If we take a different random sample and get a test statistic of zero, what can we conclude? Mark each statement as true or false.<\/p>\n<div id=\"h5p-172\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-172\" class=\"h5p-iframe\" data-content-id=\"172\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 8\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Comments about the Test Statistic<\/span><\/h2>\n<p id=\"b880260189f949dba35350988036dbec\">1. We mentioned earlier that to some degree, the test statistic captures the essence of the test. In this case, the test statistic measures the difference between\u00a0<span id=\"MathJax-Element-30-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-348\" class=\"mjx-math\"><span id=\"MJXc-Node-349\" class=\"mjx-mrow\"><span id=\"MJXc-Node-350\" class=\"mjx-mrow\"><span id=\"MJXc-Node-351\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-353\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-352\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0and\u00a0<span id=\"MathJax-Element-31-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-354\" class=\"mjx-math\"><span id=\"MJXc-Node-355\" class=\"mjx-mrow\"><span id=\"MJXc-Node-356\" class=\"mjx-mrow\"><span id=\"MJXc-Node-357\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-358\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-359\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0in standard deviations. This is exactly what this test is about. Get data, and look at the discrepancy between what the data estimates p to be (represented by\u00a0<span id=\"MathJax-Element-32-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-360\" class=\"mjx-math\"><span id=\"MJXc-Node-361\" class=\"mjx-mrow\"><span id=\"MJXc-Node-362\" class=\"mjx-mrow\"><span id=\"MJXc-Node-363\" class=\"mjx-mover\"><span class=\"mjx-stack\"><span class=\"mjx-over\"><span id=\"MJXc-Node-365\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">\u02c6<\/span><\/span><\/span><span class=\"mjx-op\"><span id=\"MJXc-Node-364\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>) and what H<sub>0<\/sub>\u00a0claims about p (represented by\u00a0<span id=\"MathJax-Element-33-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-366\" class=\"mjx-math\"><span id=\"MJXc-Node-367\" class=\"mjx-mrow\"><span id=\"MJXc-Node-368\" class=\"mjx-mrow\"><span id=\"MJXc-Node-369\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-370\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-371\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span><\/span>).<\/p>\n<p id=\"bd6fc1ea455449cb962d3496a751f275\">2. You can think about this test statistic as a measure of evidence in the data against H<sub>0<\/sub>. The larger the test statistic, the \u201cfurther the data are from H<sub>0<\/sub>\u201d and therefore the more evidence the data provide against H<sub>0<\/sub>.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"ab8b39f7c5674357aa5db9ace49d0bd0\">The UCLA Internet Report (February 2003) estimated that a proportion of roughly 0.75 of online homes are still using dial-up access, but claimed that the use of dial-up is declining. Is that really the case? To examine this, a follow-up study was conducted a year later in which, out of a random sample of 1,308 households that had Internet access, 804 were connecting using a dial-up modem.<\/p>\n<p id=\"bc62bc823a1d4e52ab2af40c87870515\">Let p be the proportion of all U.S. Internet-using households who have dial-up access. In the previous activity, we established that the appropriate hypotheses here are:<\/p>\n<p id=\"b096bc8aaf0b49428c7734fa66dc7b8f\">H<sub>0<\/sub>: p = 0.75 and H<sub>a<\/sub>: p &lt; 0.75<\/p>\n<div class=\"asx\">\n<div id=\"du4_m2_testprop4_tutor3\" class=\"activitywrap sectionNest flash\">\n<div class=\"activityhead\">\n<div class=\"activityinfo\"><\/div>\n<\/div>\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop4_tutor3\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop4_tutor3\" class=\"page 2962884\">\n<div id=\"2962884\" class=\"question\">\n<div>Based on the data, what is the sample proportion of Internet households that use a dial-up connection?<\/div>\n<\/div>\n<div>\n<div id=\"h5p-173\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-173\" class=\"h5p-iframe\" data-content-id=\"173\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 5\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div>\n<div id=\"h5p-174\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-174\" class=\"h5p-iframe\" data-content-id=\"174\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 6\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<\/div>\n<\/div>\n<div>Ann and Sam are both testing the hypothesis that 40% of plain M&amp;M\u2019s are orange, H<sub>0<\/sub>: p = 0.40. Ann draws a sample of M&amp;M\u2019s and 45% of her sample are orange. She calculates a test statistic of z = 1.25. Sam draws a sample of M&amp;M\u2019s and 50% of his sample are orange. He calculates a test statistic of z = 1.<\/div>\n<div>What can we conclude? Mark each statement as true or false.<\/div>\n<div>\n<div id=\"h5p-175\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-175\" class=\"h5p-iframe\" data-content-id=\"175\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 7\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\n<ol>\n<li style=\"list-style-type: none\">\n<ol>\n<li>\n<p id=\"b7c0b845b8d24bea9e2a398e26c5be5d\">It should now be clear why this test is commonly known as\u00a0<em class=\"italic\">the z-test for the population proportion<\/em>. The name comes from the fact that it is based on a test statistic that is a\u00a0<em class=\"italic\">z-score.<\/em><\/p>\n<\/li>\n<li>\n<p id=\"c61582f93643441c9bf0a647ee1c51ee\">Recall fact 1 that we used for constructing the z-test statistic. Here is part of it again:<\/p>\n<p id=\"d584b12403f44ff4836e94feb59df391\">When we take a\u00a0<em class=\"italic\">random<\/em>\u00a0sample of size n from a population with population proportion p, the possible values of the sample proportion ([latex]\\hat{p}[\/latex]) (<em class=\"italic\">when certain conditions are met<\/em>) have approximately a normal distribution with a mean of \u2026 and a standard deviation of \u2026.<\/p>\n<p id=\"c2244c33b8b74e8aa1609a584fef0013\">This result provides the theoretical justification for constructing the test statistic the way we did, and therefore the assumptions under which this result holds (in bold, above) are the conditions that our data need to satisfy so that we can use this test. These two conditions are:<\/p>\n<ol id=\"e9b9af0e22524967ac4bc7f27666e28b\" class=\"lower-roman\">\n<li>\n<p id=\"f4df1bbb543e440f80a81d8b80ee4800\">The sample has to be random.<\/p>\n<\/li>\n<li>\n<p id=\"ecc0ce2df03240818a375f6051ada188\">The conditions under which the sampling distribution of\u00a0[latex]\\hat{p}[\/latex]\u00a0is normal are met. In other words:<\/p>\n<\/li>\n<\/ol>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bf69f9e01c6a4afaa5fde0d2811fdf7d\" class=\"img-responsive popimg aligncenter\" title=\"n \u00d7 p_0 \u2265 10 and n \u00d7 (1 - p_0) \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image248.gif\" alt=\"n \u00d7 p_0 \u2265 10 and n \u00d7 (1 - p_0) \u2265 10\" \/><\/span><\/span><\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<ol id=\"ed9e074a3a4a4e2fad1211db9b4649de\">\n<li><span class=\"imagewrap\"><span class=\"image\">Here we will pause to say more about condition (i.) above, the need for a random sample. In the Probability Unit we discussed sampling plans based on probability (such as a simple random sample, cluster, or stratified sampling) that produce a non-biased sample, which can be safely used in order to make inferences about a population. We noted in the Probability Unit that, in practice, other (non-random) sampling techniques are sometimes used when random sampling is not feasible. It is important though, when these techniques are used, to be aware of the type of bias that they introduce, and thus the limitations of the conclusions that can be drawn from them.<\/span><\/span><\/li>\n<\/ol>\n<p id=\"fdad1b74ea8d4dac8a6c5e2a0a7f8368\">For our purpose here, we will focus on one such practice, the situation in which a sample is not really chosen randomly, but in the context of the categorical variable that is being studied, the sample is regarded as random. For example, say that you are interested in the proportion of students at a certain college who suffer from seasonal allergies. For that purpose, the students in a large engineering class could be considered as a random sample, since there is nothing about being in an engineering class that makes you more or less likely to suffer from seasonal allergies. Technically, the engineering class is a convenience sample, but it is treated as a random sample in the context of this categorical variable. On the other hand, if you are interested in the proportion of students in the college who have math anxiety, then the class of engineering students clearly could not be viewed as a random sample, since engineering students probably have a much lower incidence of math anxiety than the college population overall.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>We are conducting a survey to determine if an upcoming bond measure will receive a majority vote in the county. The null hypothesis claims that p = 0.50, where p is the proportion of registered voters in the county who say they support the bond measure.<\/p>\n<div id=\"h5p-176\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-176\" class=\"h5p-iframe\" data-content-id=\"176\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 9\"><\/iframe><\/div>\n<\/div>\n<p>Let&#8217;s check the conditions in our three examples.<\/p>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h4 class=\"textbox__title\">Example 1<\/h4>\n<\/header>\n<div class=\"textbox__content\">\n<ol id=\"ccb624427c0f4c778e312c5bf95d1d2f\" class=\"lower-roman\">\n<li>\n<p id=\"cfda7bb5cc1d44e6abfa0ff059fbf01f\">The 400 products were chosen at random.<\/p>\n<\/li>\n<li>\n<p id=\"cedd5e481e1f493689a14abfc15004b3\">n = 400,\u00a0<span id=\"MathJax-Element-3-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-13\" class=\"mjx-math\"><span id=\"MJXc-Node-14\" class=\"mjx-mrow\"><span id=\"MJXc-Node-15\" class=\"mjx-mrow\"><span id=\"MJXc-Node-16\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-17\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-18\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-19\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-20\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-21\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\n<\/li>\n<\/ol>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b33c5e0598d5432589844f47e5b4a576\" class=\"img-responsive popimg aligncenter\" title=\"* n \u00d7 p_0 = 80 \u2265 10 * n \u00d7 (1 - p_0) = 320 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image250.gif\" alt=\"* n \u00d7 p_0 = 80 \u2265 10 * n \u00d7 (1 - p_0) = 320 \u2265 10\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h4 class=\"textbox__title\">Example 2<\/h4>\n<\/header>\n<div class=\"textbox__content\">\n<ol id=\"c733d14354894e20b4c2b73502e69a28\" class=\"lower-roman\">\n<li>\n<p id=\"d7adcef89b4d4ef6af0c30ef26dc1b92\">The 100 students were chosen at random.<\/p>\n<\/li>\n<li>\n<p id=\"c476cb4d438046128c0263fb7628bb87\">n = 100,\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-22\" class=\"mjx-math\"><span id=\"MJXc-Node-23\" class=\"mjx-mrow\"><span id=\"MJXc-Node-24\" class=\"mjx-mrow\"><span id=\"MJXc-Node-25\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-26\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-27\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-28\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-29\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-30\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">157<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\n<\/li>\n<\/ol>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c949fe1aa614458496ca8467b02d7bf0\" class=\"img-responsive popimg aligncenter\" title=\"* n \u00d7 p_0 = 15.7 \u2265 10 * n \u00d7 (1 - p_0) = 84.3 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image252.gif\" alt=\"* n \u00d7 p_0 = 15.7 \u2265 10 * n \u00d7 (1 - p_0) = 84.3 \u2265 10\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h4 class=\"textbox__title\">Example 3<\/h4>\n<\/header>\n<div class=\"textbox__content\">\n<ol id=\"cea6e494e5eb4111b477a10c9d7a3735\" class=\"lower-roman\">\n<li>\n<p id=\"c6c861ea23ae48d8985e42d08e6e588d\">The 1,000 U.S. adults were chosen at random.<\/p>\n<\/li>\n<li>\n<p id=\"b0cffdf4cccf46858b0f9b0938514bdd\">n = 1,000,\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-31\" class=\"mjx-math\"><span id=\"MJXc-Node-32\" class=\"mjx-mrow\"><span id=\"MJXc-Node-33\" class=\"mjx-mrow\"><span id=\"MJXc-Node-34\" class=\"mjx-msub\"><span class=\"mjx-base\"><span id=\"MJXc-Node-35\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span id=\"MJXc-Node-36\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><span id=\"MJXc-Node-37\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-38\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-39\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">64<\/span><\/span><\/span><\/span><\/span><\/span>, and therefore:<\/p>\n<\/li>\n<\/ol>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"adc052d024464c328c1ad1ba6c55ba21\" class=\"img-responsive popimg aligncenter\" title=\"* n \u00d7 p_0 = 640 \u2265 10 * n \u00d7 (1 - p_0) = 360 \u2265 10\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image254.gif\" alt=\"* n \u00d7 p_0 = 640 \u2265 10 * n \u00d7 (1 - p_0) = 360 \u2265 10\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"b3c1c0895a3c48448db3da0f5fe3cee3\">In each of the following scenarios, you need to decide whether it is appropriate to use the z-test for the population proportion p, and if not, which condition is violated.<\/p>\n<div id=\"h5p-177\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-177\" class=\"h5p-iframe\" data-content-id=\"177\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 10\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Checking that our data satisfy the conditions under which the test can be reliably used is a very important part of the hypothesis testing process. So far we haven\u2019t explicitly included it in the 4-step process of hypothesis testing, but now that we are discussing a specific test, you can see how it fits into the process. We are therefore now going to amend our 4-step process of hypothesis testing to include this extremely important part of the process.<\/p>\n<div id=\"c0dd4d7b108f4541a1b10fb9dc99e326\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Four Steps in Hypothesis Testing<\/span><\/h2>\n<ol id=\"db37e2bc660248e6a3d1f63f26d4cf35\">\n<li>\n<p id=\"e379f2ed78cb4b09b9bae94d9fd4340f\">State the appropriate null and alternative hypotheses, H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>.<\/p>\n<\/li>\n<li>\n<p id=\"bd14c60503d349dabb31c3157373d576\">Obtain a random sample, collect relevant data, and\u00a0<em class=\"italic\">check whether the data meet the conditions under which the test can be used<\/em>. If the conditions are met, summarize the data using a test statistic.<\/p>\n<\/li>\n<li>\n<p id=\"d2f4eca67afe43de9129e459c5d1d23e\">Find the p-value of the test.<\/p>\n<\/li>\n<li>\n<p id=\"d60cac955f9543d08bae12e14d9b8d45\">Based on the p-value, decide whether or not the results are significant and\u00a0<em class=\"italic\">draw your conclusions in context.<\/em><\/p>\n<\/li>\n<\/ol>\n<p id=\"e8a9effc7c754ef6b9bf5c505174abb5\">With respect to the z-test, the population proportion that we are currently discussing:<\/p>\n<p id=\"b1550426a1984a8383c3a227b37480b9\">Step 1: Completed<\/p>\n<p id=\"b7b46e8455614f5d900183d824c2f49e\">Step 2: Completed<\/p>\n<p id=\"f2dd64b7d32c47e6b6a414b25ffc3e06\">Step 3: This is what we will work on next.<\/p>\n<hr \/>\n<div id=\"e1bbb3f1991e470383c9ac98867ac272\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">3. Finding the P-value of the Test<\/span><\/h2>\n<p id=\"ff9f322a67794311b756bf47ae9cef33\">So far we\u2019ve talked about the p-value at the intuitive level: understanding what it is (or what it measures) and how we use it to draw conclusions about the significance of our results. We will now go more deeply into how the p-value is calculated.<\/p>\n<p id=\"c3da15eff0eb4553a55a2f52fc88aca7\">It should be mentioned that eventually we will rely on technology to calculate the p-value for us (as well as the test statistic), but in order to make intelligent use of the output, it is important to first\u00a0<em class=\"italic\">understand<\/em>\u00a0the details, and only then let the computer do the calculations for us. Let\u2019s start.<\/p>\n<p id=\"fa64c70fb0dc4259af6bd90bd483c7b7\">Recall that so far we have said that the p-value is the probability of obtaining data like those observed assuming that H<sub>o<\/sub>\u00a0is true. Like the test statistic, the p-value is, therefore, a measure of the evidence against H<sub>o<\/sub>. In the case of the\u00a0<em class=\"italic\">test statistic,<\/em>\u00a0the\u00a0<em class=\"italic\">larger<\/em> it is in magnitude (positive or negative) , the further [latex]\\hat{p}[\/latex] is from <em><strong>p<sub>0<\/sub><\/strong><\/em>\u00a0, the\u00a0<em class=\"italic\">more evidence we have against H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">.\u00a0<\/em>In the case of the\u00a0<em class=\"italic\">p-value<\/em>, it is the opposite; the\u00a0<em class=\"italic\">smaller<\/em>\u00a0it is, the more unlikely it is to get data like those observed when H<sub>o<\/sub>\u00a0is true, the\u00a0<em class=\"italic\">more evidence it is against H<\/em><sub><em class=\"italic\">o<\/em><\/sub>. One can actually draw conclusions in hypothesis testing just using the test statistic, and as we\u2019ll see the p-value is, in a sense, just another way of looking at the test statistic. The reason that we actually take the extra step in this course and derive the p-value from the test statistic is that even though in this case (the test about the population proportion) and some other tests, the value of the test statistic has a very clear and intuitive interpretation, there are some tests where its value is not as easy to interpret. On the other hand, the p-value keeps its intuitive appeal across all statistical tests.<\/p>\n<p id=\"d10030a04d51418e867e4787ef1f43fe\"><em class=\"italic\">How is the p-value calculated?<\/em><\/p>\n<p id=\"dfe4effa5b6440fe935f40cb47f1a41a\">Intuitively, the p-value is the\u00a0<em class=\"italic\">probability<\/em>\u00a0of observing\u00a0<em class=\"italic\">data like those observed<\/em>\u00a0assuming that H<sub>o<\/sub>is true. Let\u2019s be a bit more formal:<\/p>\n<ul id=\"b844b1bd069e469d964a8efcd50f2cb1\">\n<li>\n<p id=\"e09aa657fb7f4d88bb27b374df321ab8\">Since this is a probability question about the\u00a0<em class=\"italic\">data<\/em>, it makes sense that the calculation will involve the data summary, the\u00a0<em class=\"italic\">test statistic.<\/em><\/p>\n<\/li>\n<li>\n<p id=\"a95f87438dc94f779759d29a6dbafe58\">What do we mean by\u00a0<em class=\"italic\">\u201clike\u201d<\/em>\u00a0those observed? By \u201clike\u201d we mean\u00a0<em class=\"italic\">\u201cas extreme or even more extreme.\u201d<\/em><\/p>\n<\/li>\n<\/ul>\n<p id=\"bd23865caeea48a2afa430b4aaefd2bd\">Putting it all together, we get that in\u00a0<em class=\"italic\">general:<\/em><\/p>\n<p id=\"a59957028a7a4d8f9fc03a7494a23441\"><em class=\"italic\">The p-value is the probability of observing a test statistic as extreme as that observed (or even more extreme) assuming that the null hypothesis is true.<\/em><\/p>\n<\/div>\n<\/div>\n<div id=\"ca3c4d07c39f453e9aab825883ea5ccd\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"ae98732c1dc64f2abe0a4d24b961d9a4\">By\u00a0<em class=\"italic\">\u201cextreme\u201d<\/em>\u00a0we mean extreme\u00a0<em class=\"italic\">in the direction of the alternative<\/em>\u00a0hypothesis.<\/p>\n<\/div>\n<\/div>\n<p id=\"c22952ba65914afc9c48481df4b3254b\"><em class=\"italic\">Specifically<\/em>, for the z-test for the population proportion:<\/p>\n<ol id=\"d525ad6b844b49009f80855371bf6c26\">\n<li>\n<p id=\"f6819d6f76b14971a23703943457a244\">If the alternative hypothesis is <em><strong>H<sub>a<\/sub> : p &lt; p<sub>0<\/sub><\/strong><\/em>\u00a0\u00a0(<em class=\"italic\">less<\/em>\u00a0than), then \u201cextreme\u201d means\u00a0<em class=\"italic\">small<\/em>, and the p-value is:<\/p>\n<p id=\"cb0ffa5351994515bf9320e5c6e8bc67\">The probability of observing a test statistic\u00a0<em class=\"italic\">as small as that observed or smaller<\/em>\u00a0if the null hypothesis is true.<\/p>\n<\/li>\n<li>\n<p id=\"e17d3c401a304c138a8504a75ff5fb34\">If the alternative hypothesis is <em><strong>H<sub>a<\/sub> : p &gt; p<sub>0<\/sub><\/strong><\/em> (<em class=\"italic\">greater<\/em>\u00a0than), then \u201cextreme\u201d means\u00a0<em class=\"italic\">large<\/em>, and the p-value is:<\/p>\n<p id=\"d819367b7e3e40f8901d72d2c0982b9d\">The probability of observing a test statistic\u00a0<em class=\"italic\">as large as that observed or larger<\/em>\u00a0if the null hypothesis is true.<\/p>\n<\/li>\n<li>\n<p id=\"f3b62e0ac469406d9b9a68c5f161a4cb\">if the alternative is\u00a0[latex]H_{a}:p\\neq p_{0}[\/latex]\u00a0(<em class=\"italic\">different<\/em>\u00a0from), then \u201cextreme\u201d means extreme in either direction\u00a0<em class=\"italic\">either small or large (i.e., large in magnitude)<\/em>, and the p-value therefore is:<\/p>\n<p id=\"e548ec3a0b874d8ea67466b123f47f99\">The probability of observing a test statistic\u00a0<em class=\"italic\">as large in magnitude as that observed or larger<\/em>\u00a0if the null hypothesis is true.<\/p>\n<\/li>\n<\/ol>\n<p id=\"a17cdf8a52284207b7e0bb86d80db43e\">(Examples: If z = -2.5: p-value = probability of observing a test statistic as small as -2.5 or smaller or as large as 2.5 or larger.<\/p>\n<p id=\"cdfefb7075114aa38bb46a0d00c8a1d8\">If z = 1.5: p-value = probability of observing a test statistic as large as 1.5 or larger, or as small as -1.5 or smaller.)<\/p>\n<p id=\"d30ad952fff94014a2fd975ca0416ab6\"><em class=\"italic\">OK, that makes sense. But how do we actually calculate it?<\/em><\/p>\n<p id=\"a431c2b26a80412c8c5f1784976d1e43\">Recall the important comment from our discussion about our test statistic,<\/p>\n<p>[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]<\/p>\n<p id=\"e8df62deee7a4c5281277dd5c37bb907\">which said that when the null hypothesis is true (i.e., when <em><strong>p = p<sub>0<\/sub><\/strong><\/em> ), the possible values of our test statistic (because it is a z-score) follow a standard normal (N(0,1), denoted by Z) distribution. Therefore, the p-value calculations (which assume that H<sub>o<\/sub>\u00a0is true) are simply standard normal distribution calculations for the 3 possible alternative hypotheses.<\/p>\n<div id=\"cb917aee58754a5cb3636aefb3294a0d\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Less Than<\/span><\/h2>\n<p id=\"df777795528345dba0210ed2e4428e54\">The probability of observing a test statistic as\u00a0<em class=\"italic\">small as that observed or smaller<\/em>, assuming that the values of the test statistic follow a standard normal distribution. We will now represent this probability in symbols and also using the normal distribution.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d0990c81beb04f838e356df68e500e1c\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p &amp;lt; p_0 \u21d2 p-value = P(Z \u2264 z)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image258.gif\" alt=\"Ha: p &amp;lt; p_0 \u21d2 p-value = P(Z \u2264 z)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f3d076b51fb242d9b0b06eb29e9b968a\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the left of 0 because it is for a test statistic which is smaller than p_0. The p-value is the area to the left of z under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image259.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the left of 0 because it is for a test statistic which is smaller than p_0. The p-value is the area to the left of z under the curve.\" \/><\/span><\/span><\/p>\n<p id=\"ea6ab08581a74cf185efde2813135049\">Looking at the shaded region, you can see why this is often referred to as a\u00a0<em class=\"italic\">left-tailed<\/em>\u00a0test. We shaded to the left of the test statistic, since less than is to the left.<\/p>\n<\/div>\n<\/div>\n<div id=\"d67cd76392ac4618bac7379972df6482\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Greater Than<\/span><\/h2>\n<p id=\"e0257e542a344125b2bd99d92c4f8b24\">The probability of observing a test statistic as\u00a0<em class=\"italic\">large as that observed or larger<\/em>, assuming that the values of the test statistic follow a standard normal distribution. Again, we will represent this probability in symbols and using the normal distribution.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bbc3b4575bdf41b9a1a617658bb4d509\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p &amp;gt; p_0 \u21d2 p-value = P(Z \u2265 z)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image260.gif\" alt=\"Ha: p &amp;gt; p_0 \u21d2 p-value = P(Z \u2265 z)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d92d63e7f69f4904bae4b6b212a8e34c\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the right of 0 because it is for a test statistic which is larger than p_0. The p-value is the area to the right of z under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image261.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0 and z. z is to the right of 0 because it is for a test statistic which is larger than p_0. The p-value is the area to the right of z under the curve.\" \/><\/span><\/span><\/p>\n<p id=\"bac6416321b4470f9f4de01f215df92e\">Looking at the shaded region, you can see why this is often referred to as a\u00a0<em class=\"italic\">right-tailed<\/em>\u00a0test. We shaded to the right of the test statistic, since greater than is to the right.<\/p>\n<\/div>\n<\/div>\n<div id=\"e29acb0c945a4833b2d0137481005a12\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Not Equal To<\/span><\/h2>\n<p id=\"fe7971c638c943f6b4eea2b88a72e08c\">The probability of observing a test statistic which is as large as in\u00a0<em class=\"italic\">magnitude<\/em>\u00a0as that observed or larger, assuming that the values of the test statistic follow a standard normal distribution.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d5bdb03b1cff410ca0b5e945209a9add\" class=\"img-responsive popimg aligncenter\" title=\"Ha: p \u2260 p_0 \u21d2 p-value = P(Z &amp;lt; |z|) + P(Z \u2265 |z|) = 2P(Z \u2265 |z|)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image262.gif\" alt=\"Ha: p \u2260 p_0 \u21d2 p-value = P(Z &amp;lt; |z|) + P(Z \u2265 |z|) = 2P(Z \u2265 |z|)\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cb76b68b6701448f91fc98563666056e\" class=\"img-responsive popimg aligncenter\" title=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0, -|z|, and |z|, where |z| and -|z| is the z-score of the observed test statistic. The p-value is the sum of the area to the right of |z| under the curve and the area to the left of -|z| under the curve.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image263.gif\" alt=\"A normal distribution curve (N(0,1)). Marked on the horizontal axis are z-scores of 0, -|z|, and |z|, where |z| and -|z| is the z-score of the observed test statistic. The p-value is the sum of the area to the right of |z| under the curve and the area to the left of -|z| under the curve.\" \/><\/span><\/span><\/p>\n<p id=\"e128aa7dba1747f5a40f5185299add84\">This is often referred to as a\u00a0<em class=\"italic\">two-tailed<\/em>\u00a0test, since we shaded in both directions.<\/p>\n<\/div>\n<\/div>\n<div id=\"fc2e4d92fda74f04a8f608fde35ea1d9\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<p id=\"a0d08d023b7845ab833c8f4289df2a87\">As noted earlier, before the widespread use of statistical software, it was common to use \u2018critical values\u2019 instead of p-values to assess the evidence provided by the data. Even though the critical values approach is not used in this course, students might find it insightful. Thus, the interested students are encouraged to review the critical value method in the following \u201cMany Students Wonder\u2026.\u201d link. If your instructor clearly states that you are required to have knowledge of the critical value method, you should definitely review the information.<\/p>\n<p id=\"d5f2219cfc2345dfa72d9fa627d60a90\">On the next page, we will apply the p-value to our three examples. But first, work through the following activities, which should help your understanding.<\/p>\n<div id=\"f2f809197ba144f1b5ef16866d02876a\" class=\"section section-learnbydoing\">\n<div class=\"sectionContain\">\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"asx\">\n<div id=\"du4_m2_testprop6_tutor2\" class=\"activitywrap sectionNest flash\">\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop6_tutor2\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop6_tutor2\" class=\"page 2963160\">\n<div id=\"2963160\" class=\"question ddfb\">\n<div>\n<p id=\"N10076\">\n<div id=\"h5p-178\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-178\" class=\"h5p-iframe\" data-content-id=\"178\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 11\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"asx\">\n<div id=\"du4_m2_testprop6_tutor3\" class=\"activitywrap sectionNest flash\">\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop6_tutor3\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop6_tutor3\" class=\"page 2963181\">\n<div id=\"2963181\" class=\"question ddfb\">\n<div>\n<p>Let\u2019s return to the scenario where we are studying the population of part-time college students. We know that in 2008, 60% of this population was female. We are curious if the proportion has decreased this year. We test the hypotheses: H<sub>0<\/sub>: p = 0.60 and H<sub>a<\/sub>: p &lt; 0.60, where p is the proportion of part-time college students that are female this year.<\/p>\n<div id=\"h5p-179\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-179\" class=\"h5p-iframe\" data-content-id=\"179\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 12\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"f7428372517f413e991d5cd1dc6c96ed\">In each of the following questions, choose the pair(s) of hypotheses for the population proportion (p) and the z statistic that match the figure.<\/p>\n<div class=\"asx\">\n<div id=\"du4_m2_testprop6_tutor5\" class=\"activitywrap sectionNest flash\">\n<div class=\"activityhead\">\n<div class=\"activityinfo\"><\/div>\n<\/div>\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop6_tutor5\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop6_tutor5\" class=\"page 2963218 271724 2963219\">\n<div id=\"2963218\" class=\"question\">\n<div>\n<p><em>Question 1:<\/em><\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N10070\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p-value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14a_image1.jpg\" alt=\"histogram with p-value filled in\" \/><\/div>\n<div>\n<div id=\"h5p-180\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-180\" class=\"h5p-iframe\" data-content-id=\"180\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 8\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p><em>Question 2:<\/em><\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N10073\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p values filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14b_image1.jpg\" alt=\"histogram with p values filled in\" \/><\/div>\n<div>\n<div id=\"h5p-181\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-181\" class=\"h5p-iframe\" data-content-id=\"181\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 9\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p><em>Question 3:<\/em><\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value plugged in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14c_image1.jpg\" alt=\"histogram with p value plugged in\" \/><\/div>\n<div>\n<div id=\"h5p-182\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-182\" class=\"h5p-iframe\" data-content-id=\"182\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 10\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p><em>Question 4:<\/em><\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14d_image1.jpg\" alt=\"histogram with p value filled in\" \/><\/div>\n<div>\n<div id=\"h5p-183\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-183\" class=\"h5p-iframe\" data-content-id=\"183\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 11\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<div>\n<p><em>Question 5:<\/em><\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"histogram with p value filled in\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/_u5_m1_digtutor14e_image1.jpg\" alt=\"histogram with p value filled in\" \/><\/div>\n<div>\n<div id=\"h5p-184\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-184\" class=\"h5p-iframe\" data-content-id=\"184\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 12\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"sectionContain\">\n<div id=\"ebfb7c516109415d948bf3e5fa49e5d8\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a2869fb5f2af4d4cb70d6748ef6f020b\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &amp;lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image264.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &amp;lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2.\" \/><\/span><\/span><\/p>\n<p id=\"a63a95e3c35c42bfaf0691b627e449fe\">The p-value in this case is:<\/p>\n<p id=\"e5906b3baaaf42b8b1a2023d471873f2\">* The probability of observing a test statistic as small as -2 or smaller, assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<p id=\"fca7105a2cbd4b23b0e93e4f5eb104af\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\n<p id=\"a4ac020c5924401482a3757b0f7d5539\">* The probability of observing a sample proportion that is 2 standard deviations or more below <em><strong>p<sub>0<\/sub> = .20<\/strong><\/em>, assuming that <em><strong>p<sub>0 <\/sub><\/strong><\/em>is the true population proportion.<\/p>\n<p id=\"df8ce4c2b7bf4962b0223dfe4298f7d3\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\n<p id=\"ab0b73b133b84ab0a0a5d735f2786782\">* The probability of observing a sample proportion of .16 or lower in a random sample of size 400, when the true population proportion is <em><strong>p<sub>0<\/sub> = .20<\/strong><\/em>.<\/p>\n<p id=\"ec99adbf019148bb9e9d41d734f41eef\">In either case, the p-value is found as shown in the following figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"df7d3f70bcbf4c608201aa1802a2d7fb\" class=\"img-responsive popimg aligncenter\" title=\"A normal N(0,1) curve. Marked on the horizontal axis are z-scores of 0 and -2. We are interested in the area to the left of -2, which is the p-value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image266.gif\" alt=\"A normal N(0,1) curve. Marked on the horizontal axis are z-scores of 0 and -2. We are interested in the area to the left of -2, which is the p-value.\" \/><\/span><\/span><\/p>\n<p id=\"d0fa3683eb53437bb9ced6335f8fb3f7\">To find\u00a0[latex]P(Z\\leq -2)[\/latex]\u00a0we can either use a table or software. Eventually, after we understand the details, we will use software to run the test for us and the output will give us all the information we need. The p-value that the statistical software provides for this specific example is 0.023. The p-value tells me that it is pretty unlikely (probability of .023) to get data like those observed (test statistic of -2 or less) assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"f595583cdd1e450b83dda29803c40ac3\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e0f2eca37d7a4fd9b768648e9117c76a\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, and z = .91\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image268.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, and z = .91\" \/><\/span><\/span><\/p>\n<p id=\"ff8d175956d24f10b6f040538e15ca29\">The p-value in this case is:<\/p>\n<p id=\"b0c649a511464ed398534341d68d83e2\">* The probability of observing a test statistic as large as .91 or larger, assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<p id=\"d2177200ec6f41bd96ff58c72280a01d\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\n<p id=\"df97f7f04e174687b6ceb463f3f1f66d\">* The probability of observing a sample proportion that is .91 standard deviations or more above<em><strong> p<sub>0<\/sub> = .157<\/strong><\/em>, assuming that <em><strong>p<sub>0<\/sub><\/strong><\/em>\u00a0is the true population proportion.<\/p>\n<p id=\"bd91fc990515403bb76042e5137224b3\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\n<p id=\"f895853a80e744fea33ad21eb1d84736\">* The probability of observing a sample proportion of .19 or higher in a random sample of size 100, when the true population proportion is <em><strong> p<sub>0<\/sub> = .157<\/strong><\/em>.<\/p>\n<p id=\"dea71678f5dc4e40a076ecb6d8c50132\">In either case, the p-value is found as shown in the following figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b4f205e0678648cea22dda0709d5f793\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve for the sampling distribution. Marked on the horizontal axis are z-scores of 0 and .91 . The p-value is the area under the curve to the right of .91 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image270.gif\" alt=\"A N(0,1) curve for the sampling distribution. Marked on the horizontal axis are z-scores of 0 and .91 . The p-value is the area under the curve to the right of .91 .\" \/><\/span><\/span><\/p>\n<p id=\"cb25395649734b3fa388237ee0dca943\">Again, at this point we can either use a table or software to find that the p-value is 0.182.<\/p>\n<p id=\"ae4c3c3ea1274a3fa71edf9c5c9d1625\">The p-value tells us that it is not very surprising (probability of .182) to get data like those observed (which yield a test statistic of .91 or higher) assuming that the null hypothesis is true.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"efa2425e9af64d34864bbe647ca13b8a\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"efa3cf69adf648cba0251dd4e40542af\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, and z = 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image271.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, and z = 2.31\" \/><\/span><\/span><\/p>\n<p id=\"c84fe337625144f7be1e4147ad0e5701\">The p-value in this case is:<\/p>\n<p id=\"ee1327e8022c470391855d5afc926492\">* The probability of observing a test statistic as large as 2.31 (or larger) or as small as -2.31 (or smaller), assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<p id=\"cd61654a968f4421afe55b336fa87169\"><em class=\"italic\">OR (recalling what the test statistic actually means in this case),<\/em><\/p>\n<p id=\"b2a06757e5564056a5413435f40802ca\">* The probability of observing a sample proportion that is 2.31 standard deviations or more away from <em><strong>p<sub>0<\/sub> = .64<\/strong><\/em>, assuming that <em><strong>p<sub>0<\/sub><\/strong><\/em> is the true population proportion.<\/p>\n<p id=\"ad05dcc349a247cfa433d3e7ac2ccae7\"><em class=\"italic\">OR, more specifically,<\/em><\/p>\n<p id=\"f1bb56a0896a49f78c28f830d4dc1582\">* The probability of observing a sample proportion as different as .675 is from .64, or even more different (i.e. as high as .675 or higher or as low as .605 or lower) in a random sample of size 1,000, when the true population proportion is <em><strong>p<sub>0<\/sub> = .64<\/strong><\/em>.<\/p>\n<p id=\"abc0dfb58dd3478d9f0e9d4729b27297\">In either case, the p-value is found as shown in the following figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"aed13d933fdd4015a9069e5b012c0235\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) sampling distribution curve, with the z-scores -2.31, 0, and 2.31 marked on the horizontal axis. The p-value is the sum of the area under the curve to the left of -2.31 and the area under the curve to the right of 2.31\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image274.gif\" alt=\"A N(0,1) sampling distribution curve, with the z-scores -2.31, 0, and 2.31 marked on the horizontal axis. The p-value is the sum of the area under the curve to the left of -2.31 and the area under the curve to the right of 2.31\" \/><\/span><\/span><\/p>\n<p id=\"ddc51e50ac2b44f8ad7f5a6b386c03a2\">Again, at this point we can either use a table or software to find that the p-value is 0.021.<\/p>\n<p id=\"cd0d91186fc44e95ac48242d74850877\">The p-value tells us that it is pretty unlikely (probability of .021) to get data like those observed (test statistic as high as 2.31 or higher or as low as -2.31 or lower) assuming that H<sub>o<\/sub>\u00a0is true.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"b73ae5f240084e35b339b6be875c984d\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"e61d4b9728f04f6fa50ec307aa870176\">We\u2019ve just seen that finding p-values involves probability calculations about the value of the test statistic assuming that H<sub>o<\/sub>\u00a0is true. In this case, when H<sub>o<\/sub>\u00a0is true, the values of the test statistic follow a standard normal distribution (i.e., the sampling distribution of the test statistic when the null hypothesis is true is N(0,1)). Therefore, p-values correspond to areas (probabilities) under the standard normal curve.<\/p>\n<p id=\"ea406fa3a28e4bf8b906d34db64f8c01\">Similarly, in\u00a0<em class=\"italic\">any test<\/em>, p-values are found using the sampling distribution of the test statistic when the null hypothesis is true (also known as the \u201cnull distribution\u201d of the test statistic). In this case, it was relatively easy to argue that the null distribution of our test statistic is N(0,1). As we\u2019ll see, in other tests, other distributions come up (like the t-distribution and the F-distribution), which we will just mention briefly, and rely heavily on the output of our statistical package for obtaining the p-values.<\/p>\n<\/div>\n<\/div>\n<p id=\"b8bfb6512feb4cdebdb9ae44125dbd83\">We\u2019ve just completed our discussion about the p-value, and how it is calculated both in general and more specifically for the z-test for the population proportion. Let\u2019s go back to the four-step process of hypothesis testing and see what we\u2019ve covered and what still needs to be discussed.<\/p>\n<div id=\"b613155bdef5459fbab7f793866f10fb\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Four Steps in Hypothesis Testing<\/span><\/h2>\n<ol id=\"c953f8e4073e4834a34684ad0b55a21e\">\n<li>\n<p id=\"ec2f65a550fe4e9eb9100de46bf2e880\">State the appropriate null and alternative hypotheses, H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>.<\/p>\n<\/li>\n<li>\n<p id=\"f9afb2dd56c6402b8a489ca12fdb1ef4\">Obtain a random sample, collect relevant data, and\u00a0<em class=\"italic\">check whether the data meet the conditions under which the test can be used.<\/em>\u00a0If the conditions are met, summarize the data using a test statistic.<\/p>\n<\/li>\n<li>\n<p id=\"f9a379463de44ef0ad3d75b936427a70\">Find the p-value of the test.<\/p>\n<\/li>\n<li>\n<p id=\"db2f452c95b544bba06febb3489477f0\">Based on the p-value, decide whether or not the results are significant, and\u00a0<em class=\"italic\">draw your conclusions in context.<\/em><\/p>\n<\/li>\n<\/ol>\n<p id=\"d898ff3c00d64528a801c12011ef9406\">With respect to the z-test the population proportion:<\/p>\n<p id=\"a6de07e042484a08b6a5603d3b43ddab\">Step 1: Completed<\/p>\n<p id=\"db79461b02a3461ea5433d8fe98a28c3\">Step 2: Completed<\/p>\n<p id=\"be3bf3b447bf48858bc5030cd95309db\">Step 3: Completed<\/p>\n<p id=\"d30f78a9a76948a1880f96a37954e14e\">Step 4: This is what we will work on next.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"a4c47efe7a9e461c907640c1e39ac328\">In 2007, a Gallup poll estimated that 45% of U.S. adults rated their financial situation as \u201cgood.\u201d We want to know if the proportion is smaller this year. We gather a random sample of 100 U.S. adults this year and find that 39 rate their financial situation as \u201cgood.\u201d Use the output from Minitab to complete the following statements about the p-value. Use numbers from the output to fill in the blanks.<\/p>\n<p id=\"ddff7421df64414f91af0b6a4a534feb\"><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a4938aa641ad4276a7a22c77b4aeb759\" class=\"img-responsive popimg aligncenter\" title=\"Test and CI for One Proportion. Test of p = 0.45 vs p &amp;lt; 0.45 Sample: 1: X = 39 N = 100 Sample p = 0.390000 95% Upper Bound = 0.485600 Z-Value = -1.21 P-Value = 0.114\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/1_img5.gif\" alt=\"Test and CI for One Proportion. Test of p = 0.45 vs p &amp;lt; 0.45 Sample: 1: X = 39 N = 100 Sample p = 0.390000 95% Upper Bound = 0.485600 Z-Value = -1.21 P-Value = 0.114\" \/><\/span><\/span><\/p>\n<div id=\"h5p-185\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-185\" class=\"h5p-iframe\" data-content-id=\"185\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 13\"><\/iframe><\/div>\n<\/div>\n<p>Do zinc supplements reduce a child&#8217;s risk of catching a cold? A medical study reports a p-value of 0.03. Are the following interpretations of the p-value valid or invalid?<\/p>\n<div id=\"h5p-186\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-186\" class=\"h5p-iframe\" data-content-id=\"186\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 14\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">4. Drawing Conclusions Based on the p-Value<\/span><\/h2>\n<p id=\"b64a5eb3c2044b82a168b1a6bea7c14c\">This last part of the four-step process of hypothesis testing is the same across all statistical tests, and actually, we\u2019ve already said basically everything there is to say about it, but it can\u2019t hurt to say it again.<\/p>\n<p id=\"ff51cd1b831b4377bf08a6dd285e569b\">The p-value is a measure of how much evidence the data present against H<sub>o<\/sub>. The smaller the p-value, the more evidence the data present against H<sub>o<\/sub>.<\/p>\n<p id=\"d635a8139eec43bfa1d2aea354449653\">We already mentioned that what determines what constitutes enough evidence against H<sub>o<\/sub>\u00a0is the\u00a0<em class=\"italic\">significance level<\/em>\u00a0(\u03b1), a cutoff point below which the p-value is considered small enough to reject H<sub>o<\/sub>\u00a0in favor of H<sub>a<\/sub>. The most commonly used significance level is 0.05.<\/p>\n<p id=\"bcbc95d524fb4e49ae9c3ccd7ad1c108\">It is important to mention again that this step has essentially two sub-steps:<\/p>\n<ol id=\"bd988bdb3cd947e3a393833b7f9d6f62\">\n<li>\n<p id=\"ee8e6201e8974feea6d78e3fab52d303\">Based on the p-value, determine whether or not the results are significant (i.e., the data present enough evidence to reject H<sub>o<\/sub>).<\/p>\n<\/li>\n<li>\n<p id=\"af462cb25585a4baf863d81f43044396b\">State your conclusions in the context of the problem.<\/p>\n<\/li>\n<\/ol>\n<p id=\"de48212f805042b6acfaad6ce8000b10\">Let\u2019s go back to our three examples and draw conclusions.<\/p>\n<div id=\"ae341756210d417bac882287c04898da\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 1<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"ec69ab7db98e4450acf3a303fa02e3bc\">(Has the proportion of defective products been reduced from 0.20 as a result of the repair?)<\/p>\n<p id=\"a2756dad13ba4bbcbce3b07560ef653e\">We found that the p-value for this test was 0.023.<\/p>\n<p id=\"abc0a8ec01424a1b9b4c089b1a907e08\">Since 0.023 is small (in particular, 0.023 &lt; 0.05), the data provide enough evidence to reject H<sub>o<\/sub>\u00a0and conclude that as a result of the repair the proportion of defective products has been reduced to below 0.20. The following figure is the complete story of this example, and includes all the steps we went through, starting from stating the hypotheses and ending with our conclusions:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c35ded4b8f5440248f282bc582fb3b0b\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = 0.20 and H_a: p &amp;lt; 0.20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2 and p-value = 0.023. Since the p-value is small we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image275.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = 0.20 and H_a: p &amp;lt; 0.20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = 0.16, and z = -2 and p-value = 0.023. Since the p-value is small we conclude that H_0 can be rejected.\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"f89a74d4c3e4453684a436c74253577f\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 2<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<p id=\"f81ca8cc68d74431bfc94e7f3b3ecaa6\">(Is the proportion of students who use marijuana at the college higher than the national proportion, which is 0.157?)<\/p>\n<p id=\"ba3e73661fbf4b74a06c8ca1e8ed6090\">We found that the p-value for this test was 0.182.<\/p>\n<p id=\"dcdbc9d50b504c1c881a9e41b46e9362\">Since 0.182 is\u00a0<em class=\"italic\">not<\/em>\u00a0small (in particular, 0.182 &gt; 0.05), the data do not provide enough evidence to reject H<sub>o<\/sub>.<\/p>\n<p id=\"c435390a011a4739a2702214f79175e7\">We therefore do\u00a0<em class=\"italic\">not<\/em>\u00a0have enough evidence to conclude that the proportion of students at the college who use marijuana is higher than the national figure. Here is the complete story of this example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ca3522fbdd28417d99c4fdd75c02d34e\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = 0.157 and H_a: p &amp;gt; 0.157. We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, z = 0.91, and p-value = 0.182. Since the p-value is too large, we conclude that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image276.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = 0.157 and H_a: p &amp;gt; 0.157. We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = 0.19, z = 0.91, and p-value = 0.182. Since the p-value is too large, we conclude that H_0 cannot be rejected.\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example 3<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"example clearfix\">\n<div>\n<p id=\"ecfdfd05b3144451a52d4ea315a7de08\">(Has the proportion of U.S. adults who support the death penalty for convicted murderers changed since 2003, when it was 0.64?)<\/p>\n<p id=\"a0c917c58df84dec8ee1eff50fce2cc1\">We found that the p-value for this test was 0.021.<\/p>\n<p id=\"b880f01c5252410e971fbcd8c7e4a8ea\">Since 0.021 is small (in particular, 0.021 &lt; 0.05), the data provide enough evidence to reject H<sub>o<\/sub>, and we conclude that the proportion of adults who support the death penalty for convicted murderers has changed since 2003. Here is the complete story of this example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"dc9ecf582d144e098cb40010d5f89159\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = 0.64 and H_a: p \u2260 0.64. We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, z = 2.31, and p-value = 0.021. Because the p-value is small, we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image277.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = 0.64 and H_a: p \u2260 0.64. We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = 0.675, z = 2.31, and p-value = 0.021. Because the p-value is small, we conclude that H_0 can be rejected.\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"asx\">\n<div class=\"activitywrap sectionNest flash\">\n<div class=\"actContain\">\n<div class=\"activity flash\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"e243a016e06549f091b001fe4cf93228\">Two hypothesis tests were conducted.<\/p>\n<p id=\"aec5c9ee0f0d4cd18aefced39a34e765\">In test I, a significance level of 0.05 was used, and the p-value was calculated to be 0.025.<\/p>\n<p id=\"c2c11c99e783458e826aa39983762583\">In test II, a significance level of 0.01 was used, and the p-value was calculated to be 0.025.<\/p>\n<div id=\"h5p-187\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-187\" class=\"h5p-iframe\" data-content-id=\"187\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 13\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<p id=\"d42407903ecf43318cbd764312a599b9\">We have now completed going through the four steps of hypothesis testing, and in particular, we learned how they are applied to the z-test for the population proportion. Let\u2019s briefly summarize:<\/p>\n<div id=\"d55b5437117e43ba852d508db9c42510\" class=\"section\">\n<div class=\"sectionContain\">\n<h3>Step 1<\/h3>\n<p id=\"f2816838a5634b519c932f7a5e3d56bb\">State the null and alternative hypotheses:<\/p>\n<p id=\"fb7890857e4a4ed19d56084b76292576\"><em><strong>H<sub>0<\/sub> : p = p<sub>0<\/sub><\/strong><\/em><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cb280e84790b4a1db00662fc3a142b62\" class=\"img-responsive popimg aligncenter\" title=\"H_a : p { one of &amp;lt;, &amp;gt;, or \u2260 } p_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image279.gif\" alt=\"H_a : p { one of &amp;lt;, &amp;gt;, or \u2260 } p_0\" \/><\/span><\/span><\/p>\n<p id=\"bd4a66c940df4841b46a6a145eb5df38\">where the choice of the appropriate alternative (out of the three) is usually quite clear from the context of the problem.<\/p>\n<\/div>\n<\/div>\n<div id=\"c7d8da069b8a48c3974ec740003586f2\" class=\"section\">\n<div class=\"sectionContain\">\n<h3>Step 2<\/h3>\n<p id=\"f34343af40e44e91974e80bcfeb147c8\">Obtain data from a sample and:<\/p>\n<p id=\"dec1adf56aec4ee0ad751ac188e4786f\">(i) Check whether the data satisfy the conditions which allow you to use this test.<\/p>\n<ul id=\"ce72d82156174778a7e8a5ec9dc1f9df\">\n<li>\n<p id=\"ac78a74eea82c4a7ca88f27a3a2b2b054\">Random sample (or at least a sample that can be considered random in context)<\/p>\n<\/li>\n<li>\n<p id=\"b46509965f0a425398caa13f47266770\"><em class=\"italic\">n<\/em>\u00a0\u22c5\u00a0<em class=\"italic\">p<\/em><sub>0<\/sub>\u00a0\u2265 10,\u00a0<em class=\"italic\">n<\/em>\u00a0\u22c5 (1 \u2212\u00a0<em class=\"italic\">p<\/em><sub>0<\/sub>) \u2265 10<\/p>\n<\/li>\n<\/ul>\n<p id=\"c9f979c7c26442e2bf03aa2377cfe365\">(ii) Calculate the sample proportion\u00a0[latex]\\hat{p}[\/latex], and summarize the data using the test statistic:<\/p>\n<p>[latex]\\mathcal{z}=\\frac{\\hat{\\mathcal{p}}-\\mathcal{p}_0}{\\sqrt{\\frac{\\mathcal{p}_0\\left(1{-\\mathcal{p}}_0\\right)}{\\mathcal{n}}}}[\/latex]<\/p>\n<p id=\"d91c5fff2150425490ecf42b82b26aa4\">(<em class=\"italic\">Recall:<\/em> This standardized test statistic represents how many standard deviations above or below <em><strong>p<\/strong><\/em><sub><em><strong>o<\/strong><\/em>\u00a0 <\/sub>our sample proportion\u00a0[latex]\\hat{p}[\/latex]\u00a0is. )<\/p>\n<\/div>\n<\/div>\n<div id=\"bac252af0fda4bfdab029458d52af720\" class=\"section\">\n<div class=\"sectionContain\">\n<h3>Step 3<\/h3>\n<p id=\"f817525d217043b895e9a90f569bb09c\">Find the p-value of the test either by using software or by using the test statistic as follows:<\/p>\n<p id=\"af4fbc3465f7428797f1ce00afaa2df1\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&lt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2264<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<p id=\"eec1af4c1f904bf4b86c8b79775d1d32\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">&gt;<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2265<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<p id=\"f1fc66c0e14f467b9cab5400d261ff10\">* for\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-msub MJXc-space3\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">0<\/span><\/span><\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-I\">:<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">P<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Z<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2265<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">|<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">z<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">|<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<div id=\"c4b9fd05f441429b823a24b5728c06a1\" class=\"section\">\n<div class=\"sectionContain\">\n<h3>Step 4<\/h3>\n<p id=\"cf809820701b4652b37a8c13f389162c\">Reach a conclusion first regarding the significance of the results, and then determine what it means in the context of the problem. Recall that:<\/p>\n<p id=\"c3936568042e43148b510f132f9de8dd\">If the p-value is small (in particular, smaller than the significance level, which is usually .05), the results are significant (in the sense that there is a significant difference between what was observed in the sample and what was claimed in H<sub>o<\/sub>), and so we reject H<sub>o<\/sub>. If the p-value is not small, we do not have enough statistical evidence to reject H<sub>o<\/sub>, and so we continue to believe that H<sub>o<\/sub><em class=\"italic\">may<\/em>\u00a0be true. (Remember, in hypothesis testing we never \u201caccept\u201d H<sub>o<\/sub>).<\/p>\n<hr \/>\n<div id=\"ccf9c0ac251147498f83f2ce4f1e43d2\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">More About Hypothesis Testing<\/span><\/h2>\n<p id=\"d1de7d8a6ee54f35b1639026f77778bf\">The issues regarding hypothesis testing that we will discuss are:<\/p>\n<p id=\"dae9ef20b54c4221b3fd02ff4387d175\">1. The effect of sample size on hypothesis testing.<\/p>\n<p id=\"dcbc1c90bae5491190b2c06be4ac3f16\">2. Statistical significance vs. practical importance. (This will be discussed in the activity following number 1.)<\/p>\n<p id=\"d424fa70d9ab4dc18e88a82916584364\">3. One-sided alternative vs. two-sided alternative\u2014understanding what is going on.<\/p>\n<p id=\"cfdaaa74ebec4a2c83ce8c1c62afba28\">4. Hypothesis testing and confidence intervals\u2014how are they related?<\/p>\n<p id=\"e40d1afc08774b1db1e780568006fd0f\">Let\u2019s start.<\/p>\n<p id=\"a6f6b0127f11475ea6666456a5ccce13\">\n<\/div>\n<\/div>\n<div id=\"fb253aa09da44e98b1ba6e7804de87ee\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">1. The Effect of Sample Size on Hypothesis Testing<\/span><\/h2>\n<p id=\"c3d4c71067d84b819b7dbb812d65a3cd\">We have already seen the effect that the sample size has on inference, when we discussed point and interval estimation for the population mean (\u03bc) and population proportion (p). Intuitively\u2026<\/p>\n<p id=\"eb21ab0a96ef471a85a54840c1cf5ff2\">Larger sample sizes give us more information to pin down the true nature of the population. We can therefore expect the\u00a0<em class=\"italic\">sample<\/em>\u00a0mean and\u00a0<em class=\"italic\">sample<\/em>\u00a0proportion obtained from a larger sample to be closer to the population mean and proportion, respectively. As a result, for the same level of confidence, we can report a smaller margin of error, and get a narrower confidence interval. What we\u2019ve seen, then, is that larger sample size gives a boost to how much we trust our sample results. In hypothesis testing, larger sample sizes have a similar effect. The following two examples will illustrate that a larger sample size provides more convincing evidence, and how the evidence manifests itself in hypothesis testing. Let\u2019s go back to our example 2 (marijuana use at a certain liberal arts college).<\/p>\n<div id=\"c0859fada2474e819f8fc4855c696b48\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2<\/h4>\n<div>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d25bf27d5d664d609e4ca7f347d15cba\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, z = .91, and p-value = .182 . Since the p-value is too large we conclude that H_0 cannot be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image276.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 100 students, represented by a smaller circle. We find that 19 use marijuana. p-hat = 19\/100 = .19, z = .91, and p-value = .182 . Since the p-value is too large we conclude that H_0 cannot be rejected.\" \/><\/span><\/span><\/p>\n<p id=\"f0f7566de64f4531835017741d5f9fe6\">The data\u00a0<em class=\"italic\">do not<\/em>\u00a0provide enough evidence that the proportion of marijuana users at the college is higher than the proportion among all U.S. college students, which is .157. So far, nothing new. Let\u2019s make small changes to the problem (and call it example 2*). The changes are highlighted and the problem is followed by a new figure that reflects the changes.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"da5d2f88278f46eeb036ac9c1b10c8bd\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2*<\/h4>\n<div>\n<p id=\"fa53fc6608964e8c9ed1b4a95eb64e32\">There are rumors that students in a certain liberal arts college are more inclined to use drugs than U.S. college students in general. Suppose that\u00a0<em class=\"italic\">in a simple random sample of 400 students from the college, 76 admitted to marijuana use<\/em>. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (reported by the Harvard School of Public Health).<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e2189d25215d416f8ff97c9b17b98931\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image288.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana.\" \/><\/span><\/span><\/p>\n<p id=\"b596f3851d3b4c9fb69ec3bccc413778\">We now have a larger sample (400 instead of 100), and also we changed the number of marijuana users (76 instead of 19).<\/p>\n<p id=\"f405381e926e4a5180579996f478fedd\">Let\u2019s carry out the test in this case.<\/p>\n<p id=\"c4599ef6dd7d42609e54aa40e22fedb9\"><em class=\"italic\">I.<\/em>\u00a0The question of interest did not change, so we are testing the same hypotheses:<\/p>\n<p id=\"b58804fb8f8b446ebf7210d5a6b5cdf2\">H<sub>o<\/sub>: p = .157<\/p>\n<p id=\"b00c07bc7329439abca95c2b6d3999e2\">H<sub>a<\/sub>: p &gt; .157<\/p>\n<p id=\"c51d7674471c4672925d0993b071213e\"><em class=\"italic\">II.<\/em>\u00a0We select a random sample of size\u00a0<em class=\"italic\">400<\/em>\u00a0and find that 76 are marijuana users.<\/p>\n<p id=\"eaa615f5f17b4d55944815211027368a\">(Note that the data satisfy the conditions that allow us to use this test. Verify this yourself).<\/p>\n<p id=\"e014135b7d0241358dc4e9d4525911c1\">Let\u2019s summarize the data:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"eadb30fd3870434fa542bd7d8284aecf\" class=\"img-responsive popimg aligncenter\" title=\"p-hat = 76\/400 = .19\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image289.gif\" alt=\"p-hat = 76\/400 = .19\" \/><\/span><\/span><\/p>\n<p id=\"e8d8b11c459248e1a1e8fcd7ba22dd4f\">This is the same sample proportion as in the original problem, so it seems that the data give us the same evidence, but when we calculate the test statistic, we see that actually this is not the case:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b9f4178a594648478e03f6737a6c1401\" class=\"img-responsive popimg aligncenter\" title=\"z = (.19 - .157) \/ \u221a[( .157 (1 - .157) )\/400 ] \u2248 1.81\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image290.gif\" alt=\"z = (.19 - .157) \/ \u221a[( .157 (1 - .157) )\/400 ] \u2248 1.81\" \/><\/span><\/span><\/p>\n<p id=\"ae79ea390c4b4c439015e57910d00de7\">Even though the sample proportion is the same (.19), since here it is based on a larger sample (400 instead of 100), it is 1.81 standard deviations above the null value of .157 (as opposed to .91 standard deviations in the original problem).<\/p>\n<p id=\"ff3b5a6ceed042c9a3f37ecdb432dbdb\"><em class=\"italic\">III.<\/em>\u00a0For the p-value, we use statistical software to find p-value = 0.035.<\/p>\n<p id=\"b9ed7681b8004c97b21f0169920ab9a8\">The p-value here is .035 (as opposed to .182 in the original problem). In other words, when H<sub>o<\/sub>\u00a0is true (i.e. when p=.157) it is quite unlikely (probability of .035) to get a sample proportion of .19 or higher based on a sample of size 400 (probability .035), and not very unlikely when the sample size is 100 (probability .182).<\/p>\n<p id=\"ce9f11eb2d9a4b4b8d89b4835273ca95\"><em class=\"italic\">IV.<\/em><\/p>\n<p id=\"cb9bf442c3fd46258ef7995f0fd3139c\">Our results here are significant. In other words, in example 2* the data provide enough evidence to reject H<sub>o<\/sub>\u00a0and conclude that the proportion of marijuana users at the college is higher than among all U.S. students.<\/p>\n<p id=\"ec6f32f6743e4553bb8e8f5743f08282\">Let\u2019s summarize with a figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b3bc15591ffe4a668340363a70a425a4\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image291.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0.\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"acb8dbf1861244fa95d7ea48986f7c0b\">What do we learn from these two examples?<\/p>\n<p id=\"a06ab0a108c246b0ba7ec441101f51c1\">We see that sample results that are based on a larger sample carry more weight.<\/p>\n<p id=\"e2130163634548ba88a5d407e31e1782\">In example 2, we saw that a sample proportion of .19 based on a sample of size of 100 was not enough evidence that the proportion of marijuana users in the college is higher than .157. Recall, from our general overview of hypothesis testing, that this conclusion (not having enough evidence to reject the null hypothesis)\u00a0<em class=\"italic\">doesn\u2019t<\/em>\u00a0mean the null hypothesis is necessarily true (so, we never \u201caccept\u201d the null); it only means that the particular study didn\u2019t yield sufficient evidence to reject the null. It\u00a0<em class=\"italic\">might<\/em>\u00a0be that the sample size was simply too small to detect a statistically significant difference.<\/p>\n<p id=\"e1aa355f90434c5aaf01a0f2c49e45ee\">However, in example 2*, we saw that when the sample proportion of .19 is obtained from a sample of size 400, it carries much more weight, and in particular, provides enough evidence that the proportion of marijuana users in the college is higher than .157 (the national figure). In\u00a0<em class=\"italic\">this<\/em>\u00a0case, the sample size of 400\u00a0<em class=\"italic\">was<\/em>\u00a0large enough to detect a statistically significant difference.<\/p>\n<p>The following activity will allow you to practice the ideas and terminology used in hypothesis testing when a result is not statistically significant.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"c0a97d5d9ed64a60a7e9822c824d3ca9\">Suppose that only 40% of the U.S. public supported the general direction of the previous U.S. administration&#8217;s policies. To gauge whether the nationwide proportion, p, of support for the\u00a0<em class=\"italic\">current<\/em>\u00a0administration is higher than 40%, a major polling organization conducts a random poll to test the hypotheses:<\/p>\n<p id=\"e4176a7efc9d49eea3c0a5593674aed0\">H<sub>o<\/sub>: p = .40<\/p>\n<p id=\"bc09925178d24ff9bf73dfa848ee62e6\">H<sub>a<\/sub>: p &gt; .40<\/p>\n<p id=\"d952849a45a544a5a10fa5951af4f062\">The results are reported to be\u00a0<em class=\"italic\">not statistically significant<\/em>, with a<em class=\"italic\">p-value of .214<\/em>.<\/p>\n<div class=\"asx\">\n<div id=\"du4_m2_testprop10_tutor1\" class=\"activitywrap sectionNest flash\">\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m2_testprop10_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m2_testprop10_tutor1\" class=\"page 271750 2963696 2963697 2963698 2963699\">\n<div>\n<p id=\"N1006E\">Decide whether each of the following statements is a valid conclusion or an invalid conclusion, based on the study:<\/p>\n<div id=\"h5p-188\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-188\" class=\"h5p-iframe\" data-content-id=\"188\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 15\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">3. One-Sided Alternative vs. Two-Sided Alternative<\/span><\/h2>\n<p id=\"c19504c1e2594a228002d51620c18a45\">Recall that earlier we noticed (only visually) that for a given value of the test statistic z, the p-value of the two-sided test is twice as large as the p-value of the one-sided test. We will now further discuss this issue. In particular, we will use our example 2 (marijuana users at a certain college) to gain better intuition about this fact.<\/p>\n<p id=\"a225dc4c5ab84c51a0cb415c95caad64\">For illustration purposes, we are actually going to use example 2* (where out of a\u00a0<em class=\"italic\">sample of size 400<\/em>, 76 were marijuana users). Let\u2019s recall example 2*, but this time give two versions of it; the original version, and a slightly changed version, which we\u2019ll call example 2**. The differences are highlighted.<\/p>\n<div id=\"b3f3ab16abb246ec93a15198f25c896f\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2*<\/h4>\n<div>\n<p id=\"e796de70099849a98b2f3255f6a9cef2\"><em class=\"italic\">There are rumors that students at a certain liberal arts college are more inclined to use drugs than U.S. college students in general.<\/em>\u00a0Suppose that in a simple random sample of 400 students from the college, 76 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p) is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"b1d847d16a934c68b6ba2c87e79636f0\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2**<\/h4>\n<div>\n<p id=\"fe40fffd0f484d87b39f008189f36d79\"><em class=\"italic\">The dean of students in a certain liberal arts college was interested in whether the proportion of students who use drugs in her college is different than the proportion among U.S. college students in general.<\/em>\u00a0Suppose that in a simple random sample of 400 students from the college, 76 admitted to marijuana use. Do the data provide enough evidence to conclude that the proportion of marijuana users among the students in the college (p)\u00a0<em class=\"italic\">differs<\/em>\u00a0from the national proportion, which is .157? (This number is reported by the Harvard School of Public Health.)<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-189\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-189\" class=\"h5p-iframe\" data-content-id=\"189\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Learn by doing 16\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"b2808b5bfe224adfb30bbe61e7d9b4f7\">Indeed, in example 2* we suspect from the outset (based on the rumors) that the overall proportion (p) of marijuana smokers at the college is\u00a0<em class=\"italic\">higher<\/em>\u00a0than the reported national proportion of .157, and therefore the appropriate alternative is H<sub>o<\/sub>:p&gt;.157. In example 2**, as a result of the change of wording (which eliminated the part about the rumors), we simply wonder if p is\u00a0<em class=\"italic\">different<\/em>\u00a0(in either direction) from the reported national proportion of .157, and therefore the appropriate alternative is the two-sided test:\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msub\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">H<\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">a<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">:<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2260<\/span><\/span><span class=\"mjx-mi MJXc-space3\"><span class=\"mjx-char MJXc-TeX-math-I\">p<\/span><\/span><\/span><span class=\"mjx-sub\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">0<\/span><\/span><\/span><\/span><\/span><\/span><\/span>. Would switching to the two-sided alternative have an effect on our results?<\/p>\n<p id=\"fd7eb22963a54531a09aa9e79df1e395\">Let\u2019s explore that.<\/p>\n<div id=\"bbb91b99778249fc9ae18a9862057ba8\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2*<\/h4>\n<div>\n<p id=\"e3cdac4a5b464033a6600ff7de069e38\">We already carried out the test for this example, and the results are summarized in the following figure:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a784b2005bb74647bade44f4d51b99d1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image293.gif\" alt=\"A large circle represents the population Students at the college. We want to know p about this population, or what is the population proportion of students using marijuana. The hypotheses are H_0: p = .157 and H_a: p &amp;gt; .157 . We take a sample of 400 students, represented by a smaller circle, and find that 76 use marijuana. Conditions are met to use our method, so p-hat = 76\/400 = .19, z = 1.81, and p-value = .035 . The p-value is low enough to let us conclude that we can reject H_0\" \/><\/span><\/span><\/p>\n<p id=\"baca855d77b748478774e464b6ede114\">The following figure reminds you how the p-value was found (using the test statistic):<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e255224896e740739c7d90221cb046d5\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve with z-scores of 0 and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the p-value = .035\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image294.gif\" alt=\"A N(0,1) curve with z-scores of 0 and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the p-value = .035\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"d27a56ac1d1f4a349c27dedf6af21074\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>2**<\/h4>\n<div>\n<p id=\"b071d595b9244e3a91b1d10cb46dbf27\">I. Here we are testing:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c00f87e2c3a141eaae01fde6b9c92344\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = 1.57 H_a: p \u2260 1.57\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image295.gif\" alt=\"H_0: p = 1.57 H_a: p \u2260 1.57\" \/><\/span><\/span><\/p>\n<p id=\"a7537b55cbd548e9aefb667abe45d3e1\">II. Since we have the same data as in example 2* (76 marijuana users out of 400), we have the same sample proportion and the same test statistic:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ef3980b82ac84102830a5a72a8e0ba39\" class=\"img-responsive popimg aligncenter\" title=\"p-hat = .19 z = 1.81\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image296.gif\" alt=\"p-hat = .19 z = 1.81\" \/><\/span><\/span><\/p>\n<p id=\"befd1bcc71cd4ae2a7b341e815b8b626\">III. Since the calculation of the p-value depends on the type of alternative we have, here is where things start to be different. Statistical software tells us that the p-value for example 2** is 0.070. Here is a figure that reminds us how the p-value was calculated (based on the test statistic):<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e877934274134200889f318c3ff31d32\" class=\"img-responsive popimg aligncenter\" title=\"A N(0,1) curve with z-scores of -1.81, 0, and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the .035 . The area to the left of 1.81 is also .035 . The p-value is the sum of these two areas, which is .07\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image297.gif\" alt=\"A N(0,1) curve with z-scores of -1.81, 0, and 1.81 marked on the horizontal axis. The area to the right of 1.81 under the curve is the .035 . The area to the left of 1.81 is also .035 . The p-value is the sum of these two areas, which is .07\" \/><\/span><\/span><\/p>\n<p id=\"a0790a23e2f74835a505e77746e74db8\">IV. If we use the .05 level of significance, the p-value we got is not small enough (.07&gt;.05), and therefore we cannot reject H<sub>o<\/sub>. In other words, the data do not provide enough evidence to conclude that the proportion of marijuana smokers in the college is different from the national proportion (.157).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"f277f12c88294e88a8aa353f0e93aec7\">What happened here?<\/p>\n<p id=\"fb9a1341fb744338b0531a28e6dc4f8b\">It should be pretty clear what happened here numerically. The p-value of the one-sided test (example 2*) is .035, suggesting the results are significant at the .05 significant level. However, the p-value of the two sided-test (example 2**) is twice the p-value of the one-sided test, and is therefore 2*.035=.07, suggesting that the results are not significant at the .05 significance level.<\/p>\n<p id=\"b53f81289a66457d992ce2104e1150ff\">Here is a more conceptual explanation:<\/p>\n<p id=\"ffdcc4ffd2554a84a975106dddd31cdc\">The idea is that in Example 2*, we began our hypothesis test with a piece of information (in the form of a rumor) about unknown population proportion p, which gave us a sort of head-start towards the goal of rejecting the null hypothesis. We foundthat the evidence that the data provided were then enough to cross the finish line and reject H<sub>o<\/sub>. In Example 2**, we had no prior information to go on, and the data alone were not enough evidence to cross the finish line and reject H<sub>o<\/sub>. The following figure illustrates this idea:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fb66dd4559344499bbf52913b48cdd9c\" class=\"img-responsive popimg aligncenter\" title=\"Two &amp;apos;races&amp;apos; which illustrate why in the two-sided example we could not eliminate H_0. In the first race, H_0: p = .157, H_a: p &amp;gt; .157 . This is a one-sided hypothesis, so we get a head start on the race. The data gets us more progress along the race track, enough that we cross the &amp;apos;finish-line&amp;apos; (being less than the significance level of .05), so we have enough evidence to reject H_0. In the two-sided problem where H_0: p = .157, H_a: p \u2260 .157, we do not have a head start, since we are not given the information of which side. So, we only have the data to give us progress on the race, which isn&amp;apos;t enough progress to cross the &amp;apos;finish-line.&amp;apos;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image298.gif\" alt=\"Two &amp;apos;races&amp;apos; which illustrate why in the two-sided example we could not eliminate H_0. In the first race, H_0: p = .157, H_a: p &amp;gt; .157 . This is a one-sided hypothesis, so we get a head start on the race. The data gets us more progress along the race track, enough that we cross the &amp;apos;finish-line&amp;apos; (being less than the significance level of .05), so we have enough evidence to reject H_0. In the two-sided problem where H_0: p = .157, H_a: p \u2260 .157, we do not have a head start, since we are not given the information of which side. So, we only have the data to give us progress on the race, which isn&amp;apos;t enough progress to cross the &amp;apos;finish-line.&amp;apos;\" \/><\/span><\/span><\/p>\n<p id=\"d1907e9896d44fc7af7b38c94ce1d0d1\">We can summarize and say that in general it is harder to reject H<sub>o<\/sub>\u00a0against a two-sided H<sub>a<\/sub>\u00a0because the p-value is twice as large. Intuitively, a one-sided alternative gives us a head-start, and on top of that we have the evidence provided by the data. When our alternative is the two-sided test, we get no head-start and all we have are the data, and therefore it is harder to cross the finish line and reject H<sub>o<\/sub>.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"b57d95b661f34e4c9c7c9c16dd424e03\">Consider the following two hypothesis testing scenarios for the population proportion (p) and corresponding studies:<\/p>\n<p id=\"f900f0fd7a4c4bc9af87711ad03e979c\"><em class=\"italic\">I.<\/em>\u00a0The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. A study was designed in order to examine whether that proportion has changed since.<\/p>\n<p id=\"bf94b29bb3534b5894cc6ab035644a79\"><em class=\"italic\">II.<\/em>The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. In light of the increasing problem of spyware, a study was designed in order to examine whether that proportion has increased since.<\/p>\n<div id=\"h5p-190\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-190\" class=\"h5p-iframe\" data-content-id=\"190\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 14\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">4. Hypothesis Testing and Confidence Intervals<\/span><\/h2>\n<p id=\"d01bed76f8df4994a15f3ea2e5d94377\">The last topic we want to discuss is the relationship between hypothesis testing and confidence intervals. Even though the flavor of these two forms of inference is different (confidence intervals estimate a parameter, and hypothesis testing assesses the evidence in the data against one claim and in favor of another), there is a strong link between them.<\/p>\n<p id=\"bad6a7f471594153b6a300fa9b77f8a5\">We will explain this link (using the z-test and confidence interval for the population proportion), and then explain how confidence intervals can be used after a test has been carried out.<\/p>\n<p id=\"b9078a5b967749b193bc3176a318efde\">Recall that a confidence interval gives us a set of plausible values for the unknown population parameter. We may therefore examine a confidence interval to informally decide if a proposed value of population proportion seems plausible.<\/p>\n<p id=\"cad80ff4f4124760b82ccddaa6fca9d2\">For example, if a 95% confidence interval for p, the proportion of all U.S. adults already familiar with Viagra in May 1998, was (.61, .67), then it seems clear that we should be able to reject a claim that only 50% of all U.S. adults were familiar with the drug, since based on the confidence interval, .50 is not one of the plausible values for p.<\/p>\n<p id=\"a66e1b1072ad4af3a857fb927321b1d5\">In fact, the information provided by a confidence interval can be formally related to the information provided by a hypothesis test. (<em class=\"italic\">Comment:<\/em>\u00a0The relationship is more straightforward for two-sided alternatives, and so we will not present results for the one-sided cases.)<\/p>\n<p id=\"c3151cd7fb0d4465af22bb8cb269b6a3\">Suppose we want to carry out the\u00a0<em class=\"italic\">two-sided test:<\/em><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"db49bb1a74f345c4b66d3951487011be\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = p_0 and H_a: p \u2260 p_0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image299.gif\" alt=\"H_0: p = p_0 and H_a: p \u2260 p_0\" \/><\/span><\/span><\/p>\n<p id=\"e09dfdc75c914fa79cdaf545b5c925fd\">using a significance level of .05.<\/p>\n<p id=\"cd0dc96c66ff41f58ed171a01a8c8a17\">An alternative way to perform this test is to find a 95%\u00a0<em class=\"italic\">confidence interval<\/em>\u00a0for p and check:<\/p>\n<p id=\"a4635f75a8cd469d8e993d99225a44d5\">If <em>p<sub>0<\/sub> <\/em>falls <em class=\"italic\">outside<\/em>\u00a0the confidence interval,\u00a0<em class=\"italic\">reject <\/em>H<sub>o<\/sub>.<\/p>\n<p id=\"ce19de1cc30049769f91cd772b6e353e\">If <em>p<sub>0<\/sub> <\/em>falls\u00a0<em class=\"italic\">inside<\/em>\u00a0the confidence interval,\u00a0<em class=\"italic\">do not reject <\/em>H<sub>o<\/sub>.<\/p>\n<p id=\"a42d42ad1cc745f387fe9caf98aa509c\">In other words, if <em>p<sub>0<\/sub><\/em>\u00a0is not one of the plausible values for p, we reject H<sub>o<\/sub>.<\/p>\n<p id=\"e27bb1bfd4674cd08e06b20fb9bd4078\">If <em>p<sub>0 <\/sub><\/em>is a plausible value for p, we cannot reject H<sub>o<\/sub>.<\/p>\n<p id=\"fa67ec65a72c42429ba33f88bfdbb3eb\">(<em class=\"italic\">Comment:<\/em>\u00a0Similarly, the results of a test using a significance level of .01 can be related to the 99% confidence interval.)<\/p>\n<p id=\"de79fd1d4c95416b880990511597daeb\">Let\u2019s look at two examples:<\/p>\n<div id=\"da587467bc314d46b104f61a20099040\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"b376a4b40c8344d680eea4e7316c5e88\">Recall example 3, where we wanted to know whether the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was .64.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a2dfdd3a552341e1b34016379d739813\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we want to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat =675\/1000 = .675.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image223.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The question we want to answer is &amp;quot;has p changed since 2003 (when it was .64)?&amp;quot; We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat =675\/1000 = .675.\" \/><\/span><\/span><\/p>\n<p id=\"beab4529e3d9490c8d19432d6d8f118c\">We are testing:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ceb6f023244949d5881a0933d9335913\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .64 and H_a: p \u2260 .64;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image300.gif\" alt=\"H_0: p = .64 and H_a: p \u2260 .64;\" \/><\/span><\/span><\/p>\n<p id=\"a2c70266a8794a9db0e9aa235e98c809\">and as the figure reminds us, we took a sample of 1,000 U.S. adults, and the data told us that 675 supported the death penalty for convicted murderers (i.e.\u00a0[latex]\\hat{p}=.675[\/latex]).<\/p>\n<p id=\"b4ef6b335e0e42fb9f9bb377efb78c99\">A 95% confidence interval for p, the proportion of\u00a0<em class=\"italic\">all<\/em>\u00a0U.S. adults who support the death penalty, is:<\/p>\n<p>[latex].675\\pm2\\sqrt{\\frac{.675(1-.675)}{1000}}\\approx.675\\pm.03=\\left(.645,\\ .705\\right)[\/latex]<\/p>\n<p id=\"f803643a0e904b6cbe32b4485d33c2db\">Since the 95% confidence interval for p does not include .64 as a plausible value for p, we can reject H<sub>o<\/sub>\u00a0and conclude (as we did before) that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b40bab0cc56b4686a9bd633af805b567\" class=\"img-responsive popimg aligncenter\" title=\"A number line illustrating the 95% confidence interval for p. The interval is (.645, .705). In H_0, p = .64, which is outside of this interval, so we can reject H_0: p = .64 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image302.gif\" alt=\"A number line illustrating the 95% confidence interval for p. The interval is (.645, .705). In H_0, p = .64, which is outside of this interval, so we can reject H_0: p = .64 .\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"cc68945d65d14b889df1a8de8c98ebf8\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"b077f9f589ba45a195b1995f61b8f341\">You and your roommate are arguing about whose turn it is to clean the apartment. Your roommate suggests that you settle this by tossing a coin and takes one out of a locked box he has on the shelf. Suspecting that the coin might not be fair, you decide to test it first. You toss the coin 80 times, thinking to yourself that if, indeed, the coin is fair, you should get around 40 heads. Instead you get 48 heads. You are puzzled. You are not sure whether getting 48 heads out of 80 is enough evidence to conclude that the coin is unbalanced, or whether this a result that could have happened just by chance when the coin is fair.<\/p>\n<p id=\"b2e973e7b1bf409a88c6fec4a2c512fa\">Statistics can help you answer this question.<\/p>\n<p id=\"f4dfe567d3cc45cd883a8026d6b7704c\">Let p be the true proportion (probability) of heads. We want to test whether the coin is fair or not:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ebb9890086ef4dce93b9525aab3ac33f\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .5, H_a: p \u2260 .5\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image303.gif\" alt=\"H_0: p = .5, H_a: p \u2260 .5\" \/><\/span><\/span><\/p>\n<p id=\"fc375fd9858745afac3f220519fe302a\">The data we have are that out of n=80 tosses, we got 48 heads, or that the sample proportion of heads is:[latex]\\hat{p}=\\frac{48}{80}=.6[\/latex]<\/p>\n<p id=\"d642f2d91b53451da8bb13cfb5a4440b\">The 95% confidence interval for p, the true proportion of heads for this coin, is:<\/p>\n<p>[latex].6\\pm 2 \\sqrt{\\frac{.6(1-.6)}{80}}\\approx .6\\pm .11=(.49,.71)[\/latex]<\/p>\n<p id=\"acb3eab7d2134911b3393824bd800459\">Since in this case .5 is one of the plausible values for p, we cannot reject H<sub>o<\/sub>. In other words, the data do not provide enough evidence to conclude that the coin is not fair.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b4608b71275c4648a02059ffb9876b9a\" class=\"img-responsive popimg aligncenter\" title=\"A number line showing the 95% confidence interval for p, which is (.49, .71). H_0 is p = .5, which falls within this interval, so we cannot reject H_0: p = .5 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image306.gif\" alt=\"A number line showing the 95% confidence interval for p, which is (.49, .71). H_0 is p = .5, which falls within this interval, so we cannot reject H_0: p = .5 .\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"ed683b7754ba4a07a00abffab8e38ad5\">The UCLA Internet Report (February 2003) estimated that roughly 8.7% of Internet users are extremely concerned about credit card fraud when buying online. A study was designed in order to examine whether that proportion has changed since. Let p be the proportion of all Internet users who are concerned about credit card fraud. In this study we are therefore testing:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"df8d7115bdee4053a5086811a084a665\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .087, H_a: p \u2260 .087\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image390.gif\" alt=\"H_0: p = .087, H_a: p \u2260 .087\" \/><\/span><\/span><\/p>\n<p id=\"ce85aeacfc5b4d6b8ae1cbe9c346b85f\">Based on the collected data, a 95% confidence interval for p was found to be (.08, .14).<\/p>\n<div id=\"h5p-191\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-191\" class=\"h5p-iframe\" data-content-id=\"191\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 15\"><\/iframe><\/div>\n<\/div>\n<p id=\"ec5707347543451fad6cb073a8d7b8fe\">The UCLA Internet Report (February 2003) estimated that roughly 60.5% of U.S. adults use the Internet at work for personal use. A follow-up study was conducted in order to explore whether that figure has changed since. Let p be the proportion of U.S. adults who use the Internet at work for personal use.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f52bc4a09cc04c4d81529cfc02542a66\" class=\"img-responsive popimg aligncenter\" title=\"H_0: p = .605, H_a: p \u2260 .605\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image391.gif\" alt=\"H_0: p = .605, H_a: p \u2260 .605\" \/><\/span><\/span><\/p>\n<p id=\"f6e68aaa4e2c4afe854b71d933f362d0\">Based on the collected data, the p-value of the test was found to be .001.<\/p>\n<div id=\"h5p-192\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-192\" class=\"h5p-iframe\" data-content-id=\"192\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"8.2 Did I get this 16\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"c7d73f362d7f4e9a8f6547e0733a6c2f\">The context of the last example is a good opportunity to bring up an important point that was discussed earlier.<\/p>\n<p id=\"f813b75445614a33a22835bf7d8ade54\">Even though we use .05 as a cutoff to guide our decision about whether the results are significant, we should not treat it as inviolable and we should always add our own judgment. Let\u2019s look at the last example again.<\/p>\n<p id=\"f1b85abb478b49b488c8bb38ef67d3cb\">It turns out that the p-value of this test is .0734. In other words, it is maybe not extremely unlikely, but it is quite unlikely (probability of .0734) that when you toss a\u00a0<em class=\"italic\">fair<\/em>\u00a0coin 80 times you\u2019ll get a sample proportion of heads of 48\/80=.6 (or even more extreme). It is true that using the .05 significance level (cutoff), .0734 is not considered small enough to conclude that the coin is not fair. However, if you really don\u2019t want to clean the apartment, the p-value might be small enough for you to ask your roommate to use a different coin, or to provide one yourself!<\/p>\n<p id=\"N10B01\">Here is our final point on this subject:<\/p>\n<p id=\"N10B04\">When the data provide enough evidence to reject H<sub>o<\/sub>, we can conclude (depending on the alternative hypothesis) that the population proportion is either less than, greater than or not equal to the null value\u00a0[latex]p_{0}[\/latex]. However, we do not get a more informative statement about its actual value. It might be of interest, then, to follow the test with a 95% confidence interval that will give us more insight into the actual value of p.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B1C\">In our example 3,<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, z = 2.31, and the p-value is .021 , which is small enough to let us reject H_0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image277.gif\" alt=\"A large circle represents the population US Adults. We want to know p about this population, which is population proportion which support the death penalty. The two hypothesis are H_0: p = .64 and H_a: p \u2260 .64 . We take a sample of 1000 US Adults, represented by a smaller circle. We find that 675 are in favor. p-hat = 675\/1000 = .675, z = 2.31, and the p-value is .021 , which is small enough to let us reject H_0.\" \/><\/span><\/span><\/p>\n<p id=\"N10B25\">we concluded that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, when it was .64. It is probably of interest not only to know that the proportion has changed, but also to estimate what it has changed to. We\u2019ve calculated the 95% confidence interval for p on the previous page and found that it is (.645, .705).<\/p>\n<p id=\"N10B28\">We can combine our conclusions from the test and the confidence interval and say:<\/p>\n<p id=\"N10B2B\">Data provide evidence that the proportion of U.S. adults who support the death penalty for convicted murderers has changed since 2003, and we are 95% confident that it is now between .645 and .705. (i.e. between 64.5% and 70.5%).<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"exHead\"><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10B31\">Let\u2019s look at our example 1 to see how a confidence interval following a test might be insightful in a different way.<\/p>\n<p id=\"N10B34\">Here is a summary of example 1:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2 and p-value = .023 . Since the p-value is small we conclude that H_0 can be rejected.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m1_inference_one_variable\/webcontent\/image275.gif\" alt=\"A large circle represents the population of products produced by the machine (following the repair). We want to know p about this population, or what is the proportion of defective products. The two hypotheses are H_0: p = .20 and H_a: p &lt; .20. We take a sample of 400 products, represented by a smaller circle. We find that 64 of these are defective. p-hat = 64\/400 = .16, and z = -2 and p-value = .023 . Since the p-value is small we conclude that H_0 can be rejected.\" \/><\/span><\/span><\/p>\n<p id=\"N10B3D\">We conclude that as a result of the repair, the proportion of defective products has been reduced to below .20 (which was the proportion prior to the repair). It is probably of great interest to the company not only to know that the proportion of defective has been reduced, but also estimate what it has been reduced to, to get a better sense of how effective the repair was. A 95% confidence interval for p in this case is:<\/p>\n<p>[latex].16\\pm2\\sqrt{\\frac{.16(1-.16)}{400}}\\approx.16\\pm.037=\\left(.129,\\ .197\\right)[\/latex]<\/p>\n<p id=\"N10BB7\">We can therefore say that the data provide evidence that the proportion of defective products has been reduced, and we are 95% sure that it has been reduced to somewhere between 12.9% and 19.7%. This is very useful information, since it tells us that even though the results were significant (i.e., the repair reduced the number of defective products), the repair might not have been effective enough, if it managed to reduce the number of defective products only to the range provided by the confidence interval. This, of course, ties back in to the idea of statistical significance vs. practical importance that we discussed earlier. Even though the results are significant (H<sub>o<\/sub>\u00a0was rejected), practically speaking, the repair might be considered ineffective.<\/p>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Let\u2019s summarize<\/span><\/h2>\n<p id=\"N10BC9\">Even though this unit is about the z-test for population proportion, it is loaded with very important ideas that apply to hypothesis testing in general. We\u2019ve already summarized the details that are specific to the z-test for proportions, so the purpose of this summary is to highlight the general ideas.<\/p>\n<p id=\"N10BCC\">The process of hypothesis testing has four steps:<\/p>\n<p id=\"N10BCF\"><em>I. Stating the null and alternative hypotheses (H<sub>o<\/sub>\u00a0and H<sub>a<\/sub>).<\/em><\/p>\n<p id=\"N10BDB\"><em>II.<\/em>\u00a0Obtaining a random sample (or at least one that can be considered random) and collecting data. Using the data:<\/p>\n<p id=\"N10BE1\">*<em>\u00a0Check that the conditions<\/em>\u00a0under which the test can be reliably used are met.<\/p>\n<p id=\"N10BE7\">*\u00a0<em>Summarize the data using a test statistic.<\/em><\/p>\n<p id=\"N10BED\">The test statistic is a measure of the evidence in the data against H<sub>o<\/sub>. The larger the test statistic is in magnitude, the more evidence the data present against H<sub>o<\/sub>.<\/p>\n<p id=\"N10BF6\"><em>III. Finding the p-value of the test.<\/em><\/p>\n<p id=\"N10BFC\">The p-value is the probability of getting data like those observed (or even more extreme) assuming that the null hypothesis is true, and is calculated using the null distribution of the test statistic. The p-value is a measure of the evidence against H<sub>o<\/sub>. The smaller the p-value, the more evidence the data present against H<sub>a<\/sub>.<\/p>\n<p id=\"N10C05\"><em>IV. Making conclusions.<\/em><\/p>\n<p id=\"N10C0B\">\u2013 Conclusions about the\u00a0<em>significance of the results:<\/em><\/p>\n<p id=\"N10C11\">If the p-value is small, the data present enough evidence to reject H<sub>o<\/sub>\u00a0(and accept H<sub>a<\/sub>).<\/p>\n<p id=\"N10C1A\">If the p-value is not small, the data do not provide enough evidence to reject H<sub>o<\/sub>.<\/p>\n<p id=\"N10C20\">To help guide our decision, we use the significance level as a cutoff for what is considered a small p-value. The significance cutoff is usually set at .05, but should not be considered inviolable.<\/p>\n<p id=\"N10C23\">\u2013 Conclusions\u00a0<em>in the context<\/em>\u00a0of the problem.<\/p>\n<p id=\"N10C29\">Results that are based on a larger sample carry more weight, and therefore\u00a0<em>as the sample size increases, results become more significant.<\/em><\/p>\n<p id=\"N10C2F\">Even a very small and practically unimportant effect becomes statistically significant with a large enough sample size. The\u00a0<em>distinction between statistical significance and practical importance<\/em>\u00a0should therefore always be considered.<\/p>\n<p id=\"N10C35\">For given data, the\u00a0<em>p-value of the two-sided test is always twice as large as the p-value of the one-sided test<\/em>. It is therefore harder to reject H<sub>o<\/sub>\u00a0in the two-sided case than it is in the one-sided case in the sense that stronger evidence is required. Intuitively, the hunch or information that leads us to use the one-sided test can be regarded as a head-start toward the goal of rejecting H<sub>o<\/sub>.<\/p>\n<p id=\"N10C41\"><em>Confidence intervals can be used in order to carry out two-sided tests<\/em>\u00a0(at the .05 significance level). If the null value is not included in the confidence interval (i.e., is not one of the plausible values for the parameter), we have enough evidence to reject H<sub>o<\/sub>. Otherwise, we cannot reject H<sub>o<\/sub>.<\/p>\n<p id=\"N10C4C\">If the results are significant, it might be of interest to\u00a0<em>follow up the tests with a confidence interval<\/em> in order to get insight into the actual value of the parameter of interest.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":3,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-557","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":421,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/557","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":23,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/557\/revisions"}],"predecessor-version":[{"id":1008,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/557\/revisions\/1008"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/421"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/557\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=557"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=557"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=557"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=557"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}