{"id":577,"date":"2024-10-18T02:51:29","date_gmt":"2024-10-18T02:51:29","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=577"},"modified":"2025-01-28T23:33:16","modified_gmt":"2025-01-28T23:33:16","slug":"11-2-chi-squared-test-for-independence","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/11-2-chi-squared-test-for-independence\/","title":{"raw":"11.2: Chi-Squared Test for Independence","rendered":"11.2: Chi-Squared Test for Independence"},"content":{"raw":"<div id=\"N10AFF\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2>Overview<\/h2>\r\n<p id=\"N10B06\">The last three procedures that we studied (two-sample t, paired t, and ANOVA) all involve the relationship between a categorical explanatory variable and a quantitative response variable, corresponding to Case C\u2192Q in the role\/type classification table below. Next, we will consider inferences about the relationships between two categorical variables, corresponding to case C\u2192C.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"It is possible for any type of explanatory variable to be paired with any type of response variable. The possible pairings are: Categorical Explanatory \u2192 Categorical Response (C\u2192C), Categorical Explanatory \u2192 Quantitative Response (C\u2192Q), Quantitative Explanatory \u2192 Categorical Response (Q\u2192C), and Quantitative Explanatory \u2192 Quantitative Response (Q\u2192Q).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image182.gif\" alt=\"It is possible for any type of explanatory variable to be paired with any type of response variable. The possible pairings are: Categorical Explanatory \u2192 Categorical Response (C\u2192C), Categorical Explanatory \u2192 Quantitative Response (C\u2192Q), Quantitative Explanatory \u2192 Categorical Response (Q\u2192C), and Quantitative Explanatory \u2192 Quantitative Response (Q\u2192Q).\" \/><\/span><\/span>\r\n<p id=\"N10B0F\">In the Exploratory Data Analysis unit of the course, we summarized the relationship between two categorical variables for a given data set (using a two-way table and conditional percents), without trying to generalize beyond the sample data.<\/p>\r\n<p id=\"N10B12\">Now we perform statistical inference for two categorical variables, using the sample data to draw conclusions about whether or not we have evidence that the variables are related in the larger population from which the sample was drawn. In other words, we would like to assess whether the relationship between X and Y that we observed in the data is due to a real relationship between X and Y in the population or if it is something that could have happened just by chance due to sampling variability.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"We have a population of interest and a question about it, which is &quot;Are the two categorical variables X and Y related?&quot; We take an SRS of size n, and summarize that data with a two-way table. Via inference, we can decide if the relationship is strong enough that we can conclude that it is due to a true relationship in the population. This inference step is what this section goes over.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image123.gif\" alt=\"We have a population of interest and a question about it, which is &quot;Are the two categorical variables X and Y related?&quot; We take an SRS of size n, and summarize that data with a two-way table. Via inference, we can decide if the relationship is strong enough that we can conclude that it is due to a true relationship in the population. This inference step is what this section goes over.\" \/><\/span><\/span>\r\n<p id=\"N10B1B\">The statistical test that will answer this question is called the\u00a0<em>chi-square test for independence<\/em>. Chi is a Greek letter that looks like this: [latex]\\chi[\/latex], so the test is sometimes referred to as: The [latex]\\chi ^{2}[\/latex] test for independence.<\/p>\r\n<p id=\"N10B3D\">The structure of this section will be very similar to that of the previous ones in this module. We will first present our leading example, and then introduce the chi-square test by going through its 4 steps, illustrating each one using the example. We will conclude by presenting another complete example. As usual, you\u2019ll have activities along the way to check your understanding, and to learn how to use software to carry out the test.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"N10B42\">Let\u2019s start with our leading example.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10B47\">In the early 1970s, a young man challenged an Oklahoma state law that prohibited the sale of 3.2% beer to males under age 21 but allowed its sale to females in the same age group. The case (Craig v. Boren, 429 U.S. 190, 1976) was ultimately heard by the U.S. Supreme Court.<\/p>\r\n<p id=\"N10B4A\">The main justification provided by Oklahoma for the law was traffic safety. One of the 3 main pieces of data presented to the court was the result of a \u201crandom roadside survey\u201d that recorded information on gender, and whether or not the driver had been drinking alcohol in the previous two hours. There were a total of 619 drivers under 20 years of age included in the survey.<\/p>\r\n<p id=\"N10B4D\">Here is what the collected data looked like:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"A table with two columns, &quot;Gender,&quot; and &quot;Drove drunk?.&quot; Each row represents one occurrence. The rows in the table (in &quot;Driver #: Gender, Drove Drunk?&quot; format): Driver 1: M, Y; Driver 2: F, N; Driver 3: F, Y; ... Driver 619: M, N;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image126.gif\" alt=\"A table with two columns, &quot;Gender,&quot; and &quot;Drove drunk?.&quot; Each row represents one occurrence. The rows in the table (in &quot;Driver #: Gender, Drove Drunk?&quot; format): Driver 1: M, Y; Driver 2: F, N; Driver 3: F, Y; ... Driver 619: M, N;\" \/><\/span><\/span>\r\n<p id=\"N10B56\">The following two-way table summarizes the observed counts in the roadside survey:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; and &quot;Total.&quot; The rows are labeled &quot;Male,&quot; &quot;Female,&quot; and &quot;Total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77 Male, No: 404 Male, Total: 481 Female, Yes: 16 Female, No: 122 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image127.gif\" alt=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; and &quot;Total.&quot; The rows are labeled &quot;Male,&quot; &quot;Female,&quot; and &quot;Total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77 Male, No: 404 Male, Total: 481 Female, Yes: 16 Female, No: 122 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" \/><\/span><\/span>\r\n<p id=\"N10B5F\">Our task is to assess whether these results provide evidence of a significant (\u201creal\u201d) relationship between gender and drunk driving.<\/p>\r\n<p id=\"N10B62\">The following figure summarizes this example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"The population comprises of all drivers under 20. The question we have about the population is &quot;is drunk driving (Y) related to gender (X)?&quot; To answer this, we create a SRS of size 619 via a roadside survey. The results from this survey are summarized in the two-way table given above. Using Inference, we can figure out if the relationship of the roadside survey strong enough that we can conclude that it is due to a real relationship between drunk driving and gender in population.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image128.gif\" alt=\"The population comprises of all drivers under 20. The question we have about the population is &quot;is drunk driving (Y) related to gender (X)?&quot; To answer this, we create a SRS of size 619 via a roadside survey. The results from this survey are summarized in the two-way table given above. Using Inference, we can figure out if the relationship of the roadside survey strong enough that we can conclude that it is due to a real relationship between drunk driving and gender in population.\" \/><\/span><\/span>\r\n<p id=\"N10B6B\">Note that as the figure stresses, since we are looking to see whether drunk driving is related to gender, our explanatory variable (X) is gender, and the response variable (Y) is drunk driving. Both variables are two-valued categorical variables, and therefore our two-way table of observed counts is 2-by-2. It should be mentioned that the chi-square procedure that we are going to introduce here is not limited to 2-by-2 situations, but can be applied to any r-by-c situation where r is the number of rows (corresponding to the number of values of one of the variables) and c is the number of columns (corresponding to the number of values of the other variable).<\/p>\r\n<p id=\"N10B6E\">Before we introduce the chi-square test, let\u2019s conduct an exploratory data analysis (that is, look at the data to get an initial feel for it). By doing that, we will also get a better conceptual understanding of the role of the test.<\/p>\r\n<p id=\"N10B71\"><em>Exploratory Analysis<\/em><\/p>\r\n<p id=\"N10B77\">Recall that the key to reporting appropriate summaries for a two-way table is deciding which of the two categorical variables plays the role of explanatory variable, and then calculating the conditional percentages \u2014 the percentages of the response variable for each value of the explanatory variable \u2014 separately. In this case, since the explanatory variable is gender, we would calculate the percentages of drivers who did (and did not) drink alcohol for males and females separately.<\/p>\r\n<p id=\"N10B7A\">Here is the table of conditional percentages:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_5\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; (in response to the Y variable, drank alcohol in the last 2 hours) and &quot;Total.&quot; The rows are labeled &quot;Male&quot; and &quot;Female.&quot; Here is the data in the table, give in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77\/481 = 16.0% Male, No: 404\/481 = 84.0% Male, Total: 100% Female, Yes: 16\/138 = 11.6% Female, No: 122\/138 = 88.4% Female, Total: 100%\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image129.gif\" alt=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; (in response to the Y variable, drank alcohol in the last 2 hours) and &quot;Total.&quot; The rows are labeled &quot;Male&quot; and &quot;Female.&quot; Here is the data in the table, give in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77\/481 = 16.0% Male, No: 404\/481 = 84.0% Male, Total: 100% Female, Yes: 16\/138 = 11.6% Female, No: 122\/138 = 88.4% Female, Total: 100%\" \/><\/span><\/span>\r\n<p id=\"N10B83\">For the 619 sampled drivers, a larger percentage of males were found to be drunk than females (16.0% vs. 11.6%). Our data, in other words, provide some evidence that drunk driving is related to gender; however, this in itself is not enough to conclude that such a relationship exists in the larger population of drivers under 20. We need to further investigate the data and decide between the following two points of view:<\/p>\r\n\r\n<ul>\r\n \t<li>\r\n<p id=\"N10B89\">The evidence provided by the roadside survey (16% vs 11.6%) is strong enough to conclude (beyond a reasonable doubt) that it must be due to a relationship between drunk driving and gender in the population of drivers under 20.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10B8D\">The evidence provided by the roadside survey (16% vs. 11.6%) is not strong enough to make that conclusion, and could have happened just by chance, due to sampling variability, and not necessarily because a relationship exists in the population.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\nActually, these two opposing points of view constitute the null and alternative hypotheses of the chi-square test for independence, so now that we understand our example and what we still need to find out, let\u2019s introduce the four-step process of this test.\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10BA1\">The purpose of this activity is to introduce you to the example that you are going to work through in this section, and for you to get a feeling for the data by conducting exploratory analysis.<\/p>\r\n<p id=\"N10BA4\">Background: Alcoholism Risk in 9\/11 Responders<\/p>\r\n<p id=\"N10BA7\">Among firefighters and other \"first responders\" to the World Trade Center on September 11, 2001, there have been reports of increased alcohol-related difficulties (e.g., DUI). A survey of 9\/11 first responders (On the Front Line: The Work of First Responders in a Post-9\/11 World) conducted by Cornell researcher Samuel Bacharach was released in 2004. To see the report: <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/10\/FirefighterStress-compressed.pdf\">Firefighter Stress<\/a>. Based on the research, we can construct the following two-way table of observed counts:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_7\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table titled &quot;Firefighters(*) vs. Alcohol Risk, Based on a 2004 Study of NY Firefighters.&quot; The columns are &quot;No risk for alcohol problems**,&quot; &quot;Moderate to Severe risk for alcohol problems**,&quot; and &quot;Total&quot;. The rows are &quot;Participated in 911 rescue,&quot; &quot;Did Not Participate in 911 rescue,&quot; and &quot;total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Participated, No risk: 783; Participated, Moderate to severe risk: 309; Participated, Total: 1102; Did Not, No risk: 441; Did Not, Moderate to severe risk: 110 Did Not, Total: 551; Total, No Risk: 1234; Total, Moderate to Severe risk: 419; Total, Total: 1653. (*): does not include officers (**): as defined by the DSM criteria (also used by the National Institute on Alcohol Abuse) and determined by survey results.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image422.gif\" alt=\"A two-way table titled &quot;Firefighters(*) vs. Alcohol Risk, Based on a 2004 Study of NY Firefighters.&quot; The columns are &quot;No risk for alcohol problems**,&quot; &quot;Moderate to Severe risk for alcohol problems**,&quot; and &quot;Total&quot;. The rows are &quot;Participated in 911 rescue,&quot; &quot;Did Not Participate in 911 rescue,&quot; and &quot;total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Participated, No risk: 783; Participated, Moderate to severe risk: 309; Participated, Total: 1102; Did Not, No risk: 441; Did Not, Moderate to severe risk: 110 Did Not, Total: 551; Total, No Risk: 1234; Total, Moderate to Severe risk: 419; Total, Total: 1653. (*): does not include officers (**): as defined by the DSM criteria (also used by the National Institute on Alcohol Abuse) and determined by survey results.\" \/><\/span><\/span>\r\n<p id=\"N10BB6\">Using the data from this research, we would like to investigate whether alcohol risk among New York firefighters is significantly related to participation in the 9\/11 rescue.<\/p>\r\n\r\n<div class=\"asx \">\r\n<div id=\"du4_m4_cc1_tutor1\" class=\"activitywrap sectionNest flash\">\r\n<div class=\"activityhead\">\r\n<div class=\"activityinfo\"><\/div>\r\n<\/div>\r\n<div class=\"actContain\">\r\n<div class=\"activity flash\">\r\n<div id=\"u4_m4_cc1_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\r\n<div id=\"ou4_m4_cc1_tutor1\" class=\"page 2963397\">\r\n<div id=\"2963397\" class=\"question\">\r\n<div>\r\n<p id=\"N1006E\">There are two categorical variables in this problem:<\/p>\r\n<p id=\"N10070\">* Alcohol risk (none, moderate to severe)<\/p>\r\n<p id=\"N10072\">* Participation in the 9\/11 rescue (yes, no)<\/p>\r\n[h5p id=\"235\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"aef2ead184eb4d13b662dc995abc3cdf\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Chi-Square Test for Independence<\/span><\/h2>\r\n<p id=\"d18342f7b19940b386a4e72433910b9f\">The chi-square test for independence examines our observed data and tells us whether we have enough evidence to conclude beyond a reasonable doubt that two categorical variables are related. Much like the previous part on the ANOVA F-test, we are going to introduce the hypotheses (step 1), and then discuss the idea behind the test, which will naturally lead to the test statistic (step 2). Let\u2019s start.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c686cc29a7f44729a30b20e7f882d9fc\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Step 1: Stating the hypotheses<\/span><\/h2>\r\n<p id=\"f4cb1020357f4f43b383087fef008f82\">Unlike all the previous tests that we presented, the null and alternative hypotheses in the chi-square test are stated in words rather than in terms of population parameters. They are:<\/p>\r\n<p id=\"bcf23b698f88418196327d19042993d3\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: There is no relationship between the two categorical variables. (They are independent.)<\/p>\r\n<p id=\"c9131bddb6864ca8842f7e5d5d5d21a2\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: There is a relationship between the two categorical variables. (They are not independent.)<\/p>\r\n\r\n<div id=\"d525f2f5cc764ca8980a56bb2a945c8f\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div>\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"d74a7c01687b4da0a7d8c4913a027797\">In our example, the null and alternative hypotheses would then state:<\/p>\r\n<p id=\"b16aa766db3547b2958378478bc27bd1\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: There is no relationship between gender and drunk driving.<\/p>\r\n<p id=\"eedcff2c6d52451db8da3b2d65f1b11c\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: There is a relationship between gender and drunk driving.<\/p>\r\n<p id=\"d89a0dcb1b3f404c95358ee3451b945d\">Or equivalently,<\/p>\r\n<p id=\"af9bc2dfd74949e482bbd22effec725c\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: Drunk driving and gender are independent<\/p>\r\n<p id=\"befc6bf1e0e04b7784834b758e73c91d\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: Drunk driving and gender are not independent<\/p>\r\n<p id=\"f27562f5f81044a7bd27528700f747ac\">and hence the name \u201cchi-square test for independence.\u201d<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"d7b9768cc9914c5da6227a08fba3e30c\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"cedff90d834b4b34a70dd2e180c1cda1\">Algebraically, independence between gender and driving drunk is equivalent to having equal proportions who drank (or did not drink) for males vs. females. In fact, the null and alternative hypotheses could have been re-formulated as<\/p>\r\n<p id=\"e78e856b900547cc8aa7e3643d3bee81\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0proportion of male drunk drivers = proportion of female drunk drivers<\/p>\r\n<p id=\"b82a15c380de481d8308f89974ceb8f7\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:\u00a0<\/em>proportion of male drunk drivers \u2260 proportion of female drunk drivers<\/p>\r\n<p id=\"c223590fbef1471082a205a760fe152c\">However, expressing the hypotheses in terms of proportions works well and is quite intuitive for two-by-two tables, but the formulation becomes very cumbersome when at least one of the variables has several possible values, not just two. We are therefore going to always stick with the \u201cwordy\u201d form of the hypotheses presented in step 1 above.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"e9f0766b5b754c468974f4918b58ef10\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Idea of the Chi-Square Test<\/span><\/h2>\r\n<p id=\"d2403ca399c647178d5baa78e8e0cf06\">The idea behind the chi-square test, much like previous tests that we\u2019ve introduced, is to measure how far the data are from what is claimed in the null hypothesis. The further the data are from the null hypothesis, the more evidence the data presents against it. We\u2019ll use our data to develop this idea. Our data are represented by the observed counts:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a9ed2a49426d4bc2adcd61accfb079c7\" class=\"img-responsive popimg aligncenter\" title=\"The two-way table with counts. The cells which are not in a Total row or column are the observed counts. Full description: A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619; The observed counts are Male, Yes; Male, No; Female, Yes; Female, No;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image131.gif\" alt=\"The two-way table with counts. The cells which are not in a Total row or column are the observed counts. Full description: A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619; The observed counts are Male, Yes; Male, No; Female, Yes; Female, No;\" \/><\/span><\/span>\r\n<p id=\"ab0c2a81916847e0beffbcb86e21c27a\">How will we represent the null hypothesis?<\/p>\r\n<p id=\"c4607985bb2f460991e69cdf829434ab\">In the previous tests we introduced, the null hypothesis was represented by the null value. Here there is not really a null value, but rather a claim that the two categorical variables (drunk driving and gender, in this case) are independent.<\/p>\r\n<p id=\"d44f4499c0fd4853a048bd889cf5bbe4\">To represent the null hypothesis, we will calculate another set of counts \u2014 the counts that we would expect to see (instead of the observed ones) if drunk driving and gender were really independent (i.e., if H<sub>o<\/sub>\u00a0were true). For example, we actually observed 77 males who drove drunk; if drunk driving and gender were indeed independent (if H<sub>o<\/sub>\u00a0were true), how many male drunk drivers would we expect to see instead of 77? Similarly, we can ask the same kind of question about (and calculate) the other three cells in our table.<\/p>\r\n<p id=\"aaf3052ee6e44bb48ea7426a3049e034\">In other words, we will have two sets of counts:<\/p>\r\n\r\n<ul id=\"b8411479b51244e4b36aeeb3928ddd8f\">\r\n \t<li>\r\n<p id=\"c94a44bc2fee48799ace6e67f2e48a71\">the observed counts (the data)<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"c63e4db632f0446c8d77a4d2a0bfbb2d\">the expected counts (if H<sub>o<\/sub>\u00a0were true)<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"b4c97e60335348dba00fbbe2200d0d14\">We will measure how far the observed counts are from the expected ones. Ultimately, we will base our decision on the size of the discrepancy between what we observed and what we would expect to observe if H<sub>o<\/sub>\u00a0were true.<\/p>\r\n<p id=\"c0191e125b144c2fbe753ccd391031ab\">How are the expected counts calculated? Once again, we are in need of probability results. Recall from the probability section that if events A and B are independent, then P(A and B) = P(A) * P(B). We use this rule for calculating expected counts, one cell at a time.<\/p>\r\n<p id=\"f3e682632f0d402d9dd675658f9fef1d\">Here again are the observed counts:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"da94bd7714d94826898ce112ebc4b2de\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image132.gif\" alt=\"A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span>\r\n<p id=\"cd79f15275204594b5e3aaaa8e8677a5\">Applying the rule to the first (top left) cell, if driving drunk and gender were independent then:<\/p>\r\n<p id=\"de9e5605242544a8a07726814c525122\">P(drunk and male) = P(drunk) * P(male)<\/p>\r\n<p id=\"a4231dd0692a4d1d986eee775a326856\">By dividing the counts in our table, we see that:<\/p>\r\n<p id=\"f8bed347190a44fea3da273f9ba931e1\">P(Drunk) = 93 \/ 619 and<\/p>\r\n<p id=\"b3a32553357d495ab1ae7f8795c493f8\">P(Male) = 481 \/ 619,<\/p>\r\n<p id=\"d49f3665c13544c79710a57e85fd5f8c\">and so,<\/p>\r\n<p id=\"abf7f8795f77447da39cdb3a2a0a4bff\">P(Drunk and Male) = (93 \/ 619) (481 \/ 619)<\/p>\r\n<p id=\"e095f87aadc14e8195b30aee6d3bd814\">Therefore, since there are total of 619 drivers,\u00a0<em class=\"italic\">if drunk driving and gender were independent<\/em>, the\u00a0<em class=\"italic\">count\u00a0<\/em>of drunk male drivers that I would\u00a0<em class=\"italic\">expect<\/em>\u00a0to see is:<\/p>\r\n[latex]619*P(Drunk\\ and\\ Male)=619\\left ( \\frac{93}{619} \\right )\\left ( \\frac{481}{619} \\right )=\\frac{93*481}{619}[\/latex]\r\n<p id=\"eb2622b262c34ca4821260cd59170b24\">Notice that this expression is the product of the column and row totals for that particular cell, divided by the overall table total.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a2dea1be38dc435dac7704466d99773c\" class=\"img-responsive popimg aligncenter\" title=\"P(Drunk and Male) is calculated using 3 cells from the two-way table. These are the row total (Male, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Male) = (column total * row total)\/(table total)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image133.gif\" alt=\"P(Drunk and Male) is calculated using 3 cells from the two-way table. These are the row total (Male, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Male) = (column total * row total)\/(table total)\" \/><\/span><\/span>\r\n<p id=\"ae705ae0a37f4200a9abfdb7c2512d92\">Similarly, if the variables are independent,<\/p>\r\n<p id=\"f57cb8e5a3cc409092ef4b66d64b70e5\">P(Drunk and Female) = P(Drunk) * P(Female) = (93 \/ 619) (138 \/ 619)<\/p>\r\n<p id=\"f35f1547464c43f5bdc06cca827ba492\">and the expected count of females driving drunk would be<\/p>\r\n[latex]\\left(\\frac{93}{619}\\right)\\left(\\frac{138}{619}\\right)=\\frac{93\\ast138}{619}[\/latex]\r\n<p id=\"de35a57a4a234e34881825fbfa821daf\">Again, the expected count equals the product of the corresponding column and row totals, divided by the overall table total:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b7348a857c7146a19c6d99031a16858b\" class=\"img-responsive popimg aligncenter\" title=\"P(Drunk and Female) is calculated using 3 cells from the two-way table. These are the row total (Female, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Female) = (column total * row total)\/(table total)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image134.gif\" alt=\"P(Drunk and Female) is calculated using 3 cells from the two-way table. These are the row total (Female, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Female) = (column total * row total)\/(table total)\" \/><\/span><\/span>\r\n<p id=\"f21c1499ae9e471080c59c2e2b52421b\">This will always be the case, and will help streamline our calculations:<\/p>\r\n[latex]Expected\\ Count=\\frac{Column\\ total\\ \\ast Row\\ total}{Table\\ total}[\/latex]\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"236\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"f79d31dc9b1d40dba866091388c943c6\">Here is the complete table of expected counts, followed by the table of observed counts:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d3cbcfa8767d4ad293d59bbe7d8fb41e\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: (93 * 481)\/619 = 72.3; Male, No: (526 * 481)\/619 = 408.7; Male, Total: 481; Female, Yes: (93 * 138)\/619 = 20.7; Female, No: (526 * 138)\/619 = 117.3; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image136.gif\" alt=\"A two-way table for expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: (93 * 481)\/619 = 72.3; Male, No: (526 * 481)\/619 = 408.7; Male, Total: 481; Female, Yes: (93 * 138)\/619 = 20.7; Female, No: (526 * 138)\/619 = 117.3; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span>\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e12bc80c342c4aa6820a38be0eaa2104\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image137.gif\" alt=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"efdf8424cceb416f96350da66fb147ea\">A study was done on the relationship between gender and piercing among high-school students. A sample of 1,000 students was chosen, then classified according to gender and according to whether or not they had any of their ears pierced. The results of the study are summarized in the following 2-by-2 table:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"f70131904009407ea2e16bb6bf1ecdf3\" class=\"img-responsive popimg aligncenter\" title=\"A two way table with &amp;quot;Yes Piercing,&amp;quot; &amp;quot;No Pierceing,&amp;quot; and &amp;quot;Total&amp;quot; columns. The rows are &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Female, Yes: 576; Female, No: 64; Female, Total: 640; Male, Yes: 72; Male, No: 288; Male, Total: 360; Total, Yes: 648; Total, No: 352; Total, Total: 1000\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image425.gif\" alt=\"A two way table with &amp;quot;Yes Piercing,&amp;quot; &amp;quot;No Pierceing,&amp;quot; and &amp;quot;Total&amp;quot; columns. The rows are &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Female, Yes: 576; Female, No: 64; Female, Total: 640; Male, Yes: 72; Male, No: 288; Male, Total: 360; Total, Yes: 648; Total, No: 352; Total, Total: 1000\" \/><\/span><\/span>\r\n\r\n[h5p id=\"237\"]\r\n\r\n<\/div>\r\n<\/div>\r\nWe see that there are differences between the observed and expected counts in the respective cells. We now have to come up with a measure that will quantify these differences. This is the chi-square test statistic.\r\n<div id=\"e162a5169a8843cb838035910712238a\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Step 2: Checking the Conditions and Calculating the Test Statistic<\/span><\/h2>\r\n<p id=\"b3928ee4160d456b9d122818b36d0362\">Given our discussion on the previous page, it would be natural to present the test statistic, and then come back to the conditions that allow us to safely use the chi-square test, although in practice this is done the other way around.<\/p>\r\n<p id=\"fba287d452d645269a1f37e16993b3d6\">The single number that summarizes the overall difference between observed and expected counts is the chi-square statistic\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0, which tells us in a standardized way how far what we observed (data) is from what would be expected if H<sub>o<\/sub>\u00a0were true.<\/p>\r\n<p id=\"fed43793f07a420585addca3e1230903\">Here it is:<\/p>\r\n[latex]\\mathcal{X}^2=\\sum_{all\\ cells}\\frac{\\left(Observed\\ Count-Expected\\ Count\\right)^2}{Expected\\ Count}[\/latex]\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c95ab22af4714b96bb076cfb24d83f7e\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"b6ca4b0efe2543a8abb5c2286fbaaa30\">As we expected,\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0is based on each of the differences: observed count \u2013 expected count (one such difference for each cell), but why is it squared? Why do we divide each square difference by the expected count? The reason we do that is so that the null distribution of\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0will have a known null distribution (under which p-values can be easily calculated). The details are really beyond the scope of this course, but we will just say that the null distribution of\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0is called chi-square (which is not very surprising given that the test is called the chi-square test), and like the t-distributions there are many chi-square distributions distinguished by the number of degrees of freedom associated with them.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"ada8c09578ea4f02bf0555967836b33e\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Conditions under Which the Chi-Square Test Can Safely Be Used<\/span><\/h2>\r\n<ol id=\"c8346e9c11b34abcbe9214acd20480d8\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"cb3502f65fec4046b8617164a922f111\">The sample should be random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ad252e4f548c4392b98030d66f32765b\">In general, the larger the sample, the more accurate and reliable the test results are. There are different versions of what the conditions are that will ensure reliable use of the test, all of which involve the expected counts. One version of the conditions says that all expected counts need to be greater than 1, and at least 80% of expected counts need to be greater than 5. A more conservative version requires that all expected counts are larger than 5.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<div id=\"c18a5785e83841ddb07ff5ff723d2fb7\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"ddcc2e8038f440d99acc6ca1095a2d19\">Here, again, are the observed and expected counts.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d7eca2291f9a4a99bd20f569106649f8\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed and expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: observed: 77 expected: 72.3 Male, No: observed: 404 expected: 408.7 Male, Total: 481 Female, Yes: observed: 16, expected: 20.7 Female, No: observed: 122, expected: 117.3 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image141.gif\" alt=\"A two-way table for observed and expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: observed: 77 expected: 72.3 Male, No: observed: 404 expected: 408.7 Male, Total: 481 Female, Yes: observed: 16, expected: 20.7 Female, No: observed: 122, expected: 117.3 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" \/><\/span><\/span>\r\n<p id=\"f4fb35ef4c664e75a46683c260cc3f70\">Checking the conditions:<\/p>\r\n\r\n<ol id=\"a3602a79a8c14bb5a756bc72ef8bcf53\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"a159a94029244bb7b5ef81bce0c62adc\">The roadside survey is known to have been random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af49206e0cea479ea88d8d42e24bdc26\">All the expected counts are above 5.<\/p>\r\n<p id=\"c1f425821c8d4e8598c67598de39bda9\">We can therefore safely proceed with the chi-square test, and the chi-square test statistic is:<\/p>\r\n[latex]\\frac{\\left(77-72.3\\right)^2}{72.3}+\\frac{\\left(404-408.7\\right)^2}{408.7}+\\frac{\\left(16-20.7\\right)^2}{20.7}+\\frac{\\left(122-117.3\\right)^2}{117.3}=.306+.054+1.067+.188=1.62[\/latex]<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"f5a14d97bb9d4451aaddf0055b93c5eb\">A study was done on the relationship between gender and piercing among high-school students. A sample of 1,000 students was chosen, and then classified according to both gender and whether or not they had either of their ears pierced. The following (edited) StatCrunch output is available:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"cd33a0db5e71492d93889494c63e8c34\" class=\"img-responsive popimg aligncenter\" title=\" Female, Pierced: Count: 576, Expected Count: 414.7 Female, No Pierced: Count: 64, Expected Count: 225.3 Female, Total: 640 Male, Pierced: Count: 72, Expected Count: 233.3 Male, No Pierced: Count: 288, Expected Count: 126.7 Male, Total: 360 Total, Pierced: 648, No Pierced: 252, Total: 1000 Chi-Squared test: Chi-Square: DF = 1, P-value &amp;lt; 0.0001\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image426_statcrunch.gif\" alt=\" Female, Pierced: Count: 576, Expected Count: 414.7 Female, No Pierced: Count: 64, Expected Count: 225.3 Female, Total: 640 Male, Pierced: Count: 72, Expected Count: 233.3 Male, No Pierced: Count: 288, Expected Count: 126.7 Male, Total: 360 Total, Pierced: 648, No Pierced: 252, Total: 1000 Chi-Squared test: Chi-Square: DF = 1, P-value &amp;lt; 0.0001\" \/><\/span><\/span>\r\n\r\n[h5p id=\"238\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"cede765c8b67485fa95c8e087069349f\">Once the chi-square statistic has been calculated, we can get a feel for its size: is there a relatively large difference between what we observed and what the null hypothesis claims, or a relatively small one? It turns out that for a 2-by-2 case like ours, we are inclined to call the chi-square statistic \u201clarge\u201d if it is larger than 3.84. Therefore, our test statistic is not large, indicating that the data are not different enough from the null hypothesis for us to reject it (we will also see that in the p-value not being small). For other cases (other than 2-by-2) there are different cut-offs for what is considered large, which are determined by the null distribution in that case. We are therefore going to rely only on the p-value to draw our conclusions. Even though we cannot really use the chi-square statistic, it was important to learn about it, since it encompasses the idea behind the test.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"fd5b03c1d91c42d7a8097e35c1c740ca\">The purpose of this activity is to continue to explore whether the risk of alcohol problems among New York firefighters and first responders is related to participation in the 911 rescue. In particular, in this activity, we will state the hypotheses that are being tested, learn how to carry out the chi-square test for independence using statistical software, and check whether the conditions under which this test can be safely used are met.<\/p>\r\n\r\n<table id=\"c8af3840295240c9abbd714bcbbfbeff_bx\" class=\"table labeled aligncenter\">\r\n<thead>\r\n<tr>\r\n<th>Observed data<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\">\r\n<p id=\"ac915985ae7244abd960f688d89b42a2c\">New York Firefighters and first responders<\/p>\r\n&nbsp;<\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table id=\"c8af3840295240c9abbd714bcbbfbeff\" class=\"grid\" cellspacing=\"0\" align=\"center\">\r\n<thead>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa58eda83f9384db2acd68a8369673b77\"><\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ac53d63ee30d1452eb5c4e278a124b609\">No risk for alchohol problems<\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ad923c5f12ed64bf5b0706364c0561160\">Moderate to Servere risk for alcohol problems<\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ade73d6d5bdbf46beb43f1b20535f9768\">Total<\/p>\r\n<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"afe74a1729df44a0b9876716a4a0f0853\">Participated in 911 rescue<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab1d470c99d8a42b7a210ca9e0641bbd7\">793<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab8715b0986a842daac7ea8cc6c4e2712\">309<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aadbb1fd1050487c82535db0dccd08fe\"><em class=\"bold\">1102<\/em><\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aaffbdb3df1714e75b78331c529d4ad1a\">Did Not Participate in 911 rescue<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"af54b9e47176b49e5bae9ae96b6df8cd6\">441<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ac5d0957e734b45e3bbf3ba068060dba5\">110<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"f39e3227b2da4ab7a1cc4c57e308da5a\"><em class=\"bold\">551<\/em><\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa8b2d1d8fe64412b98ed4aa93059e721\">Total<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"a939da9ff1684082b1738c73b4e651f2\"><em class=\"bold\">1234<\/em><\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ffe554203f9a40dc903016571d23b191\"><em class=\"bold\">419<\/em><\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"f20fdcc9748048aab2f318b3accff063\"><em class=\"bold\">1653<\/em><\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[h5p id=\"239\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"f91f62332b614080950906b1d57158a1\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Step 3: Finding the p-value<\/span><\/h2>\r\n<p id=\"a44afcc3857f481db030c8b010e9c86c\">The p-value for the chi-square test for independence is the probability of getting counts like those observed, assuming that the two variables are not related (which is what is claimed by the null hypothesis). The smaller the p-value, the more surprising it would be to get counts like we did, if the null hypothesis were true.<\/p>\r\n<p id=\"a04dca1d85ef42169afa93129941aaf1\">Technically, the p-value is the probability of observing\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0at least as large as the one observed. Using statistical software, we find that the p-value for this test is 0.201.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"b34795ce63b54ab39053342641fe1d07\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Step 4: Stating the conclusion in context<\/span><\/h2>\r\n<p id=\"da1e79641035444989006d0c54fde0e6\">As usual, we use the magnitude of the p-value to draw our conclusions. A small p-value indicates that the evidence provided by the data is strong enough to reject H<sub>o<\/sub>\u00a0and conclude (beyond a reasonable doubt) that the two variables are related. In particular, if a significance level of .05 is used, we will reject H<sub>o<\/sub>\u00a0if the p-value is less than .05.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"d6335e343e574ee594f3c01e55979d0d\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"e0423d5b75ef44a4863d62d6ace6010e\">A p-value of .201 is not small at all. There is no compelling statistical evidence to reject H<sub>o<\/sub>, and so we will continue to assume it may be true. Gender and drunk driving may be independent, and so the data suggest that a law that forbids sale of 3.2% beer to males and permits it to females is unwarranted. In fact, the Supreme Court, by a 7-2 majority, struck down the Oklahoma law as discriminatory and unjustified. In the majority opinion Justice Brennan wrote (http:\/\/www.law.umkc.edu\/faculty\/projects\/ftrials\/conlaw\/craig.html):<\/p>\r\n<p id=\"ea29bc5c9d5d4a4ea8a91354f4157e76\">\u201cClearly, the protection of public health and safety represents an important function of state and local governments. However, appellees\u2019 statistics in our view cannot support the conclusion that the gender-based distinction closely serves to achieve that objective and therefore the distinction cannot under [prior case law] withstand equal protection challenge.\u201d<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"a8abf946ba644b16856b3c4390d6858e\">The purpose of this activity is to draw our conclusion regarding the relationship between participation in the 9\/11 rescue and risk of alcohol problems among New York firefighters and first responders.<\/p>\r\n\r\n<div id=\"c9a38f2fc30242fb9dca21c65474c59a\" class=\"pulloutwrap\">\r\n<div class=\"pullout clearfix\">\r\n<div class=\"Excel2019PC altContentOn\">\r\n<div class=\"alternative\">\r\n<p id=\"e484671afdc44a849f0175283379c0b0\">In the previous activity, we created a table of expected counts to go along with our table of observed counts. In this activity, we will use both tables to conduct a chi-square test on the data.<\/p>\r\n<p id=\"c5d19a5a811647dba659121698623119\">To do this in Excel, we first need to re-create both the table of observed counts and table of expected counts from the last exercise. Here are the data again for your convenience:<\/p>\r\n<p id=\"f6ed578026dd4da490cc4940bf36455d\">In the previous activity, we carried out the chi-square test using StatCrunch and obtained the following output:<\/p>\r\n<p id=\"a405a602898e41309a22b1151446ab84\"><em class=\"italic\">Contingency table results:<\/em><\/p>\r\n<p id=\"ec92782b5dd2466ca216b4065cde236b\">Rows: 911<\/p>\r\n<p id=\"e71c786cabb349c89e16a89a5a7f1449\">Columns: None<\/p>\r\n<p id=\"af41b1143020d4fd086fbbaff81ddccbf\"><span class=\"imagewrap\"><span class=\"image\"><img id=\"bb426694710740d0b832561b9b6191b4\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/lbd012_statcrunch.gif\" alt=\"\" \/><\/span><\/span><\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"240\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"eb918e73eb0d4f86adaae75f47f7148b\">This is a good opportunity to illustrate an important idea that was discussed earlier in this unit: The larger the sample the results are based on, the more evidence they carry. Let\u2019s take the previous example and simply multiply each of the counts by 3:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bba7dd41c6654553a3333e20a1bf6fa3\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; (categories of Drank Alcohol in the last 2 hours?) and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231; Male, No: 1212; Male, Total: 1443; Female, Yes: 48; Female, No: 366; Female, Total: 414; Total, Yes: 279; Total, No: 1578; Total, Total: 1875;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image144.gif\" alt=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; (categories of Drank Alcohol in the last 2 hours?) and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231; Male, No: 1212; Male, Total: 1443; Female, Yes: 48; Female, No: 366; Female, Total: 414; Total, Yes: 279; Total, No: 1578; Total, Total: 1875;\" \/><\/span><\/span>\r\n<p id=\"f4c3f6c3c4ab49ba9c83473eca82c052\">and see what would have happened if these were the original data. Obviously, the conditional counts would remain the same:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e0960f17817c4b15a13ba9187b342168\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for conditional counts, in which the columns are labeled &amp;quot;Yes&amp;quot; and &amp;quot;No&amp;quot; (categories of Drank Alcohol in the last 2 hours?). The rows are labeled &amp;quot;Male&amp;quot; and &amp;quot;Female.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231\/1443 = 16.0% Male, No: 1212\/1443 = 84.0% Female, Yes: 48\/414 = 11.6% Female, No: 366\/414 = 88.4%\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image145.gif\" alt=\"A two-way table for conditional counts, in which the columns are labeled &amp;quot;Yes&amp;quot; and &amp;quot;No&amp;quot; (categories of Drank Alcohol in the last 2 hours?). The rows are labeled &amp;quot;Male&amp;quot; and &amp;quot;Female.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231\/1443 = 16.0% Male, No: 1212\/1443 = 84.0% Female, Yes: 48\/414 = 11.6% Female, No: 366\/414 = 88.4%\" \/><\/span><\/span>\r\n<p id=\"ba34df6be90341948068618ca2f5eb6e\">In other words, the sample provides the \u201csame\u201d results, but this time they are based on a much larger sample (1857 instead of 619). This is reflected by the chi-square test. In this case, software gives us a chi-square statistic of 4.910 and a p-value of 0.027.<\/p>\r\n<p id=\"fc74c261239649239816ba89b0fc971e\">As before, H<sub>o<\/sub>\u00a0states that gender and drunk driving are not related; H<sub>a<\/sub>\u00a0states that they are related. Since the observed counts are triple what they were before, the expected counts are also tripled. When done with software the original chi-square statistic was 1.637 since software doesn\u2019t round as much. The chi-square statistic when we tripled the data is 3 times 1.637, or 4.91 (which now is in the \u201clarge\u201d range). Therefore, the p-value is smaller and is now .027.<\/p>\r\n<p id=\"a9bb0ad87e214026874b78e59566b937\">Now, we do reject H<sub>o<\/sub>, and we conclude that gender and drunk driving are related. In this case, the \u201clargest contribution to chi-square\u201d is large enough to provide evidence of a relationship. This is due to the fact that so few females drove drunk (48) compared to the number that would be expected (62.2, which is 414 * 279 \/ 1857) if the variables gender and drunk driving were not related. This contribution is\u00a0[latex]\\frac{\\left(48-62.2\\right)^2}{62.2}=3.242[\/latex].<\/p>\r\n<p id=\"dca01463b5b944928c4fd12215716cea\">Let\u2019s look at another example.<\/p>\r\n\r\n<div id=\"bd5884a1d2d0443cb77e91f114d4c4de\" class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>Steroid Use in Sports<\/h4>\r\n<div>\r\n<p id=\"becad8787f9a44b79200166f4efdfe19\">Major-league baseball star Barry Bonds admitted to using a steroid cream during the 2003 season. Is steroid use different in baseball than in other sports? According to the 2001 National Collegiate Athletic Association (NCAA) survey (http:\/\/www.ncaa.org\/library\/research\/substance_use_habits\/2001\/substance_use_habits.pdf), which is self-reported and asked of a stratified random selection of teams from each of the three NCAA divisions, reported steroid use among the top 5 college sports was as follows:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bcf63532b809450cab495a119a853cff\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 26; Baseball, Reported Not Using: 1088; Baseball, Total: 1114; Basketball, Reported Using: 13; Basketball, Reported Not Using: 881; Basketball, Total: 894; Football, Reported Using: 59; Football, Reported Not Using: 1897; Football, Total: 1956; Tennis, Reported Using: 2; Tennis, Reported Not Using: 335; Tennis, Total: 337; Track\/Field, Reported Using: 6; Track\/Field, Reported Not Using: 486; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image188.gif\" alt=\"A two-way table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 26; Baseball, Reported Not Using: 1088; Baseball, Total: 1114; Basketball, Reported Using: 13; Basketball, Reported Not Using: 881; Basketball, Total: 894; Football, Reported Using: 59; Football, Reported Not Using: 1897; Football, Total: 1956; Tennis, Reported Using: 2; Tennis, Reported Not Using: 335; Tennis, Total: 337; Track\/Field, Reported Using: 6; Track\/Field, Reported Not Using: 486; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" \/><\/span><\/span>\r\n<p id=\"b2418376aaa54848a029766bfe61ff6d\">Do the data provide evidence of a significant relationship between steroid use and the type of sport? In other words, are there significant differences in steroid use among the different sports?<\/p>\r\n<p id=\"a38359b7a237487885ea87ab24174ca0\">Before we carry out the chi-square test for independence, let\u2019s get a sense of the data by calculating the conditional percents:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"cb180d9913a9473a98641f1720666ff8\" class=\"img-responsive popimg aligncenter\" title=\"A two-way conditional percent table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 2.3% Baseball, Reported Not Using: 97.7%; Baseball, Total: 1114; Basketball, Reported Using: 1.5%; Basketball, Reported Not Using: 98.5%; Basketball, Total: 894; Football, Reported Using: 3%; Football, Reported Not Using: 97%; Football, Total: 1956; Tennis, Reported Using: .6%; Tennis, Reported Not Using: 99.4%; Tennis, Total: 337; Track\/Field, Reported Using: 1.2%; Track\/Field, Reported Not Using: 98.8%; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image189.gif\" alt=\"A two-way conditional percent table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 2.3% Baseball, Reported Not Using: 97.7%; Baseball, Total: 1114; Basketball, Reported Using: 1.5%; Basketball, Reported Not Using: 98.5%; Basketball, Total: 894; Football, Reported Using: 3%; Football, Reported Not Using: 97%; Football, Total: 1956; Tennis, Reported Using: .6%; Tennis, Reported Not Using: 99.4%; Tennis, Total: 337; Track\/Field, Reported Using: 1.2%; Track\/Field, Reported Not Using: 98.8%; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" \/><\/span><\/span>\r\n<p id=\"b6d756bc072a48ed8016b0b00069189c\">It seems as if there are differences in steroid use among the different sports. Even though the differences do not seem to be overwhelming, since the sample size is so large, these differences might be significant. Let\u2019s carry out the test and see.<\/p>\r\n<p id=\"d3ffb1060fc94f8eb92f43776857e2ba\"><em class=\"italic\">Step 1: Stating the hypotheses<\/em><\/p>\r\n<p id=\"e82d5fc9515249028fec93b83a328369\">The hypotheses are:<\/p>\r\n<p id=\"b4df6e71ca95400a9f325ca86bde9f20\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0steroid use is not related to the type of sport (or: type of sport and steroid use are independent)<\/p>\r\n<p id=\"de46534ed2af462a9a9c734dcb99a9d3\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0Steroid use is related to the type of sport (or: type of sport and steroid use are not independent).<\/p>\r\n<p id=\"acbd9fdea65e478286a6fca3c5b16cbc\"><em class=\"italic\">Step 2: Checking conditions and finding the test statistic<\/em><\/p>\r\n<p id=\"d246d8faaec7441ab5e345bda586d524\">Here is the Minitab output of the chi-square test for this example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d326d66b43fb4b1e8720b0f1f4ad7861\" class=\"img-responsive popimg aligncenter\" title=\"Chi-Square Test: mem used, men not used. Baseball: men used: Observed: 26, Expected: 24.4, Chi-Square contribution: 0.075; Baseball: men not used: Observed: 1088, Expected: 1089.36, Chi-Square contribution: 0.002; Baseball: Total: 1114; Basketball: men used: Observed: 13, Expected: 19.77, Chi-Square contribution: 2.319; Basketball: men not used: Observed: 881, Expected: 874.23, Chi-Square contribution: 0.052; Basketball: Total: 894; Football: men used: Observed: 59, Expected: 43.6, Chi-Square contribution: 5.729; Football: men not used: Observed: 1879, Expected: 1912.74, Chi-Square contribution: 0.130; Football: Total: 1956 Tennis: men used: Observed: 2, Expected: 7.45, Chi-Square contribution: 3.990; Tennis: men not used: Observed: 335, Expected: 329.55, Chi-Square contribution: 0.090; Tennis: Total: 337; track\/field: men used: Observed: 6, Expected: 10.88, Chi-Square contribution: 2.189; track\/field: men not used: Observed: 486, Expected: 481.22, Chi-Square contribution: 0.050; track\/field: Total: 492; Total: Men used: 106, Men not Used: 4689, Total: 4793; Chi-Sq = 14.626, DF = 4, P-Value = 0.006\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image190.gif\" alt=\"Chi-Square Test: mem used, men not used. Baseball: men used: Observed: 26, Expected: 24.4, Chi-Square contribution: 0.075; Baseball: men not used: Observed: 1088, Expected: 1089.36, Chi-Square contribution: 0.002; Baseball: Total: 1114; Basketball: men used: Observed: 13, Expected: 19.77, Chi-Square contribution: 2.319; Basketball: men not used: Observed: 881, Expected: 874.23, Chi-Square contribution: 0.052; Basketball: Total: 894; Football: men used: Observed: 59, Expected: 43.6, Chi-Square contribution: 5.729; Football: men not used: Observed: 1879, Expected: 1912.74, Chi-Square contribution: 0.130; Football: Total: 1956 Tennis: men used: Observed: 2, Expected: 7.45, Chi-Square contribution: 3.990; Tennis: men not used: Observed: 335, Expected: 329.55, Chi-Square contribution: 0.090; Tennis: Total: 337; track\/field: men used: Observed: 6, Expected: 10.88, Chi-Square contribution: 2.189; track\/field: men not used: Observed: 486, Expected: 481.22, Chi-Square contribution: 0.050; track\/field: Total: 492; Total: Men used: 106, Men not Used: 4689, Total: 4793; Chi-Sq = 14.626, DF = 4, P-Value = 0.006\" \/><\/span><\/span>\r\n<ul id=\"ab26561d131141a896cb364787c91901\" class=\"none\">\r\n \t<li>\r\n<p id=\"e07c152be16a4c1487d479a1a7cce031\">Conditions:<\/p>\r\n\r\n<ol id=\"d845e42a027f4d78894d5f5b29bb6a7e\" class=\"lower-roman\">\r\n \t<li>\r\n<p id=\"ee1916d236bf4fd8bfb7426d803c4835\">We are told that the sample was random.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b954a2e9ce85494db812e7f452494570\">All the expected counts are above 5.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<\/li>\r\n \t<li>\r\n<p id=\"a3eabd87533843f08ea9659e57929d6f\">Test statistic:<\/p>\r\n<p id=\"fb766867822a442cb5bf845621a494d6\">The test statistic is 14.626. Note that the \u201clargest contributors\u201d to the test statistic are 5.729 and 3.990. The first cell corresponds to football players who used steroids, with an observed count larger than we would expect to see under independence. The second cell corresponds to tennis players who used steroids, and has an observed count lower than we would expect under independence.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"c34478b2e49f4f44a4be5fb92d5a6f5e\"><em class=\"italic\">Step 3: Finding the p-value<\/em><\/p>\r\n<p id=\"b764be5ae3174b21853cb95b5454bf20\">According to the output p-value it would be extremely unlikely (probability of 0.006) to get counts like those observed if the null hypothesis were true. In other words, it would be very surprising to get data like those observed if steroid use were not related to sport type.<\/p>\r\n<p id=\"ef3640dc082e45638035dc45b7776433\"><em class=\"italic\">Step 4: Conclusion<\/em><\/p>\r\n<p id=\"c381cacca54a4aadb51a87b21b8c3f7d\">The small p-value indicates that the data provide strong evidence against the null hypothesis, so we reject it and conclude that the steroid use is related to the type of sport.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"b98a6983a7944948b594f6813adb87be\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<ul id=\"f34f5e5297c846dd8f7a8f0be74f9569\">\r\n \t<li>\r\n<p id=\"c5bcd91a752347579d8176e7b7210198\">The chi-square test for independence is used to test whether the relationship between two categorical variables is significant. In other words, the chi-square procedure assesses whether the data provide enough evidence that a true relationship between the two variables exists in the population.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d11ff31417854a89bee0602e25d2af94\">The hypotheses that are being tested in the chi-square test for independence are:<\/p>\r\n\r\n<ul id=\"ac29df7dbd6c4009bc161dcd5b13b2f9\" class=\"none\">\r\n \t<li>\r\n<p id=\"deeabf81e60b49d7bef2b085f48b7e5f\">H<sub>o<\/sub>: There is no relationship between \u2026 and \u2026.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b3f19aff88014fa2b28e0815112096fc\">H<sub>a<\/sub>: There is a relationship between \u2026 and \u2026.<\/p>\r\n<p id=\"fbf5e517934e421e8be1e4aa8cb2c132\"><\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b4d2b81d988c4a5cbeb877db54a2c74d\">or equivalently,<\/p>\r\n<p id=\"fbe794fae8d04ccfa8060d964a829aaf\"><\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d9db036cb4f04804aacb0f45cac3e620\">H<sub>o<\/sub>: The variables \u2026 and \u2026 are independent.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d0bb5a4b3b8a46d08da23973181b9bdf\">H<sub>a<\/sub>: The variables \u2026 and \u2026 are not independent.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>\r\n<p id=\"bcaff40a3fb74bccbae50e79b919f933\">The idea behind the test is measuring how far the observed data are from the null hypothesis by comparing the observed counts to the expected counts\u2014the counts that we would expect to see (instead of the observed ones) had the null hypothesis been true. The expected count of each cell is calculated as follows:<span class=\"imagewrap\"><span class=\"image\"><img id=\"d8f101a0be7a497caa67e7c069b4b547\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image523f.png\" alt=\"\" \/><\/span><\/span><\/p>\r\n&nbsp;<\/li>\r\n \t<li>\r\n<p id=\"a80996991b2d4e89b62aa910b27408e4\">The measure of the difference between the observed and expected counts is the chi-square test statistic, whose null distribution is called the chi-square distribution. The chi-square test statistic is calculated as follows:<span class=\"imagewrap\"><span class=\"image\"><img id=\"c3e73562f9ec4d51aaad9e9c05755f1d\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image523g.png\" alt=\"\" \/><\/span><\/span><\/p>\r\n&nbsp;<\/li>\r\n \t<li>\r\n<p id=\"e09cb57270364676bbf2ede5625654a6\">Once we verify that the conditions that allow us to safely use the chi-square test are met, we use software to carry it out and use the p-value to guide our conclusions.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"N10AFF\" class=\"section\">\n<div class=\"sectionContain\">\n<h2>Overview<\/h2>\n<p id=\"N10B06\">The last three procedures that we studied (two-sample t, paired t, and ANOVA) all involve the relationship between a categorical explanatory variable and a quantitative response variable, corresponding to Case C\u2192Q in the role\/type classification table below. Next, we will consider inferences about the relationships between two categorical variables, corresponding to case C\u2192C.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"It is possible for any type of explanatory variable to be paired with any type of response variable. The possible pairings are: Categorical Explanatory \u2192 Categorical Response (C\u2192C), Categorical Explanatory \u2192 Quantitative Response (C\u2192Q), Quantitative Explanatory \u2192 Categorical Response (Q\u2192C), and Quantitative Explanatory \u2192 Quantitative Response (Q\u2192Q).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image182.gif\" alt=\"It is possible for any type of explanatory variable to be paired with any type of response variable. The possible pairings are: Categorical Explanatory \u2192 Categorical Response (C\u2192C), Categorical Explanatory \u2192 Quantitative Response (C\u2192Q), Quantitative Explanatory \u2192 Categorical Response (Q\u2192C), and Quantitative Explanatory \u2192 Quantitative Response (Q\u2192Q).\" \/><\/span><\/span><\/p>\n<p id=\"N10B0F\">In the Exploratory Data Analysis unit of the course, we summarized the relationship between two categorical variables for a given data set (using a two-way table and conditional percents), without trying to generalize beyond the sample data.<\/p>\n<p id=\"N10B12\">Now we perform statistical inference for two categorical variables, using the sample data to draw conclusions about whether or not we have evidence that the variables are related in the larger population from which the sample was drawn. In other words, we would like to assess whether the relationship between X and Y that we observed in the data is due to a real relationship between X and Y in the population or if it is something that could have happened just by chance due to sampling variability.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"We have a population of interest and a question about it, which is &quot;Are the two categorical variables X and Y related?&quot; We take an SRS of size n, and summarize that data with a two-way table. Via inference, we can decide if the relationship is strong enough that we can conclude that it is due to a true relationship in the population. This inference step is what this section goes over.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image123.gif\" alt=\"We have a population of interest and a question about it, which is &quot;Are the two categorical variables X and Y related?&quot; We take an SRS of size n, and summarize that data with a two-way table. Via inference, we can decide if the relationship is strong enough that we can conclude that it is due to a true relationship in the population. This inference step is what this section goes over.\" \/><\/span><\/span><\/p>\n<p id=\"N10B1B\">The statistical test that will answer this question is called the\u00a0<em>chi-square test for independence<\/em>. Chi is a Greek letter that looks like this: [latex]\\chi[\/latex], so the test is sometimes referred to as: The [latex]\\chi ^{2}[\/latex] test for independence.<\/p>\n<p id=\"N10B3D\">The structure of this section will be very similar to that of the previous ones in this module. We will first present our leading example, and then introduce the chi-square test by going through its 4 steps, illustrating each one using the example. We will conclude by presenting another complete example. As usual, you\u2019ll have activities along the way to check your understanding, and to learn how to use software to carry out the test.<\/p>\n<\/div>\n<\/div>\n<p id=\"N10B42\">Let\u2019s start with our leading example.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10B47\">In the early 1970s, a young man challenged an Oklahoma state law that prohibited the sale of 3.2% beer to males under age 21 but allowed its sale to females in the same age group. The case (Craig v. Boren, 429 U.S. 190, 1976) was ultimately heard by the U.S. Supreme Court.<\/p>\n<p id=\"N10B4A\">The main justification provided by Oklahoma for the law was traffic safety. One of the 3 main pieces of data presented to the court was the result of a \u201crandom roadside survey\u201d that recorded information on gender, and whether or not the driver had been drinking alcohol in the previous two hours. There were a total of 619 drivers under 20 years of age included in the survey.<\/p>\n<p id=\"N10B4D\">Here is what the collected data looked like:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"A table with two columns, &quot;Gender,&quot; and &quot;Drove drunk?.&quot; Each row represents one occurrence. The rows in the table (in &quot;Driver #: Gender, Drove Drunk?&quot; format): Driver 1: M, Y; Driver 2: F, N; Driver 3: F, Y; ... Driver 619: M, N;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image126.gif\" alt=\"A table with two columns, &quot;Gender,&quot; and &quot;Drove drunk?.&quot; Each row represents one occurrence. The rows in the table (in &quot;Driver #: Gender, Drove Drunk?&quot; format): Driver 1: M, Y; Driver 2: F, N; Driver 3: F, Y; ... Driver 619: M, N;\" \/><\/span><\/span><\/p>\n<p id=\"N10B56\">The following two-way table summarizes the observed counts in the roadside survey:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; and &quot;Total.&quot; The rows are labeled &quot;Male,&quot; &quot;Female,&quot; and &quot;Total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77 Male, No: 404 Male, Total: 481 Female, Yes: 16 Female, No: 122 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image127.gif\" alt=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; and &quot;Total.&quot; The rows are labeled &quot;Male,&quot; &quot;Female,&quot; and &quot;Total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77 Male, No: 404 Male, Total: 481 Female, Yes: 16 Female, No: 122 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" \/><\/span><\/span><\/p>\n<p id=\"N10B5F\">Our task is to assess whether these results provide evidence of a significant (\u201creal\u201d) relationship between gender and drunk driving.<\/p>\n<p id=\"N10B62\">The following figure summarizes this example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"The population comprises of all drivers under 20. The question we have about the population is &quot;is drunk driving (Y) related to gender (X)?&quot; To answer this, we create a SRS of size 619 via a roadside survey. The results from this survey are summarized in the two-way table given above. Using Inference, we can figure out if the relationship of the roadside survey strong enough that we can conclude that it is due to a real relationship between drunk driving and gender in population.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image128.gif\" alt=\"The population comprises of all drivers under 20. The question we have about the population is &quot;is drunk driving (Y) related to gender (X)?&quot; To answer this, we create a SRS of size 619 via a roadside survey. The results from this survey are summarized in the two-way table given above. Using Inference, we can figure out if the relationship of the roadside survey strong enough that we can conclude that it is due to a real relationship between drunk driving and gender in population.\" \/><\/span><\/span><\/p>\n<p id=\"N10B6B\">Note that as the figure stresses, since we are looking to see whether drunk driving is related to gender, our explanatory variable (X) is gender, and the response variable (Y) is drunk driving. Both variables are two-valued categorical variables, and therefore our two-way table of observed counts is 2-by-2. It should be mentioned that the chi-square procedure that we are going to introduce here is not limited to 2-by-2 situations, but can be applied to any r-by-c situation where r is the number of rows (corresponding to the number of values of one of the variables) and c is the number of columns (corresponding to the number of values of the other variable).<\/p>\n<p id=\"N10B6E\">Before we introduce the chi-square test, let\u2019s conduct an exploratory data analysis (that is, look at the data to get an initial feel for it). By doing that, we will also get a better conceptual understanding of the role of the test.<\/p>\n<p id=\"N10B71\"><em>Exploratory Analysis<\/em><\/p>\n<p id=\"N10B77\">Recall that the key to reporting appropriate summaries for a two-way table is deciding which of the two categorical variables plays the role of explanatory variable, and then calculating the conditional percentages \u2014 the percentages of the response variable for each value of the explanatory variable \u2014 separately. In this case, since the explanatory variable is gender, we would calculate the percentages of drivers who did (and did not) drink alcohol for males and females separately.<\/p>\n<p id=\"N10B7A\">Here is the table of conditional percentages:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_5\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; (in response to the Y variable, drank alcohol in the last 2 hours) and &quot;Total.&quot; The rows are labeled &quot;Male&quot; and &quot;Female.&quot; Here is the data in the table, give in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77\/481 = 16.0% Male, No: 404\/481 = 84.0% Male, Total: 100% Female, Yes: 16\/138 = 11.6% Female, No: 122\/138 = 88.4% Female, Total: 100%\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image129.gif\" alt=\"A two-way table, in which the columns are labeled &quot;Yes,&quot; &quot;No,&quot; (in response to the Y variable, drank alcohol in the last 2 hours) and &quot;Total.&quot; The rows are labeled &quot;Male&quot; and &quot;Female.&quot; Here is the data in the table, give in cell format (&quot;Row, Column: Value&quot;): Male, Yes: 77\/481 = 16.0% Male, No: 404\/481 = 84.0% Male, Total: 100% Female, Yes: 16\/138 = 11.6% Female, No: 122\/138 = 88.4% Female, Total: 100%\" \/><\/span><\/span><\/p>\n<p id=\"N10B83\">For the 619 sampled drivers, a larger percentage of males were found to be drunk than females (16.0% vs. 11.6%). Our data, in other words, provide some evidence that drunk driving is related to gender; however, this in itself is not enough to conclude that such a relationship exists in the larger population of drivers under 20. We need to further investigate the data and decide between the following two points of view:<\/p>\n<ul>\n<li>\n<p id=\"N10B89\">The evidence provided by the roadside survey (16% vs 11.6%) is strong enough to conclude (beyond a reasonable doubt) that it must be due to a relationship between drunk driving and gender in the population of drivers under 20.<\/p>\n<\/li>\n<li>\n<p id=\"N10B8D\">The evidence provided by the roadside survey (16% vs. 11.6%) is not strong enough to make that conclusion, and could have happened just by chance, due to sampling variability, and not necessarily because a relationship exists in the population.<\/p>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<p>Actually, these two opposing points of view constitute the null and alternative hypotheses of the chi-square test for independence, so now that we understand our example and what we still need to find out, let\u2019s introduce the four-step process of this test.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10BA1\">The purpose of this activity is to introduce you to the example that you are going to work through in this section, and for you to get a feeling for the data by conducting exploratory analysis.<\/p>\n<p id=\"N10BA4\">Background: Alcoholism Risk in 9\/11 Responders<\/p>\n<p id=\"N10BA7\">Among firefighters and other &#8220;first responders&#8221; to the World Trade Center on September 11, 2001, there have been reports of increased alcohol-related difficulties (e.g., DUI). A survey of 9\/11 first responders (On the Front Line: The Work of First Responders in a Post-9\/11 World) conducted by Cornell researcher Samuel Bacharach was released in 2004. To see the report: <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/10\/FirefighterStress-compressed.pdf\">Firefighter Stress<\/a>. Based on the research, we can construct the following two-way table of observed counts:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_7\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table titled &quot;Firefighters(*) vs. Alcohol Risk, Based on a 2004 Study of NY Firefighters.&quot; The columns are &quot;No risk for alcohol problems**,&quot; &quot;Moderate to Severe risk for alcohol problems**,&quot; and &quot;Total&quot;. The rows are &quot;Participated in 911 rescue,&quot; &quot;Did Not Participate in 911 rescue,&quot; and &quot;total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Participated, No risk: 783; Participated, Moderate to severe risk: 309; Participated, Total: 1102; Did Not, No risk: 441; Did Not, Moderate to severe risk: 110 Did Not, Total: 551; Total, No Risk: 1234; Total, Moderate to Severe risk: 419; Total, Total: 1653. (*): does not include officers (**): as defined by the DSM criteria (also used by the National Institute on Alcohol Abuse) and determined by survey results.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image422.gif\" alt=\"A two-way table titled &quot;Firefighters(*) vs. Alcohol Risk, Based on a 2004 Study of NY Firefighters.&quot; The columns are &quot;No risk for alcohol problems**,&quot; &quot;Moderate to Severe risk for alcohol problems**,&quot; and &quot;Total&quot;. The rows are &quot;Participated in 911 rescue,&quot; &quot;Did Not Participate in 911 rescue,&quot; and &quot;total.&quot; Here is the data in the table, given in cell format (&quot;Row, Column: Value&quot;): Participated, No risk: 783; Participated, Moderate to severe risk: 309; Participated, Total: 1102; Did Not, No risk: 441; Did Not, Moderate to severe risk: 110 Did Not, Total: 551; Total, No Risk: 1234; Total, Moderate to Severe risk: 419; Total, Total: 1653. (*): does not include officers (**): as defined by the DSM criteria (also used by the National Institute on Alcohol Abuse) and determined by survey results.\" \/><\/span><\/span><\/p>\n<p id=\"N10BB6\">Using the data from this research, we would like to investigate whether alcohol risk among New York firefighters is significantly related to participation in the 9\/11 rescue.<\/p>\n<div class=\"asx\">\n<div id=\"du4_m4_cc1_tutor1\" class=\"activitywrap sectionNest flash\">\n<div class=\"activityhead\">\n<div class=\"activityinfo\"><\/div>\n<\/div>\n<div class=\"actContain\">\n<div class=\"activity flash\">\n<div id=\"u4_m4_cc1_tutor1\" class=\"flash_obj asx testFlash mark_flash\">\n<div id=\"ou4_m4_cc1_tutor1\" class=\"page 2963397\">\n<div id=\"2963397\" class=\"question\">\n<div>\n<p id=\"N1006E\">There are two categorical variables in this problem:<\/p>\n<p id=\"N10070\">* Alcohol risk (none, moderate to severe)<\/p>\n<p id=\"N10072\">* Participation in the 9\/11 rescue (yes, no)<\/p>\n<div id=\"h5p-235\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-235\" class=\"h5p-iframe\" data-content-id=\"235\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Learn by doing\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"aef2ead184eb4d13b662dc995abc3cdf\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Chi-Square Test for Independence<\/span><\/h2>\n<p id=\"d18342f7b19940b386a4e72433910b9f\">The chi-square test for independence examines our observed data and tells us whether we have enough evidence to conclude beyond a reasonable doubt that two categorical variables are related. Much like the previous part on the ANOVA F-test, we are going to introduce the hypotheses (step 1), and then discuss the idea behind the test, which will naturally lead to the test statistic (step 2). Let\u2019s start.<\/p>\n<\/div>\n<\/div>\n<div id=\"c686cc29a7f44729a30b20e7f882d9fc\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Step 1: Stating the hypotheses<\/span><\/h2>\n<p id=\"f4cb1020357f4f43b383087fef008f82\">Unlike all the previous tests that we presented, the null and alternative hypotheses in the chi-square test are stated in words rather than in terms of population parameters. They are:<\/p>\n<p id=\"bcf23b698f88418196327d19042993d3\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: There is no relationship between the two categorical variables. (They are independent.)<\/p>\n<p id=\"c9131bddb6864ca8842f7e5d5d5d21a2\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: There is a relationship between the two categorical variables. (They are not independent.)<\/p>\n<div id=\"d525f2f5cc764ca8980a56bb2a945c8f\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div>\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"d74a7c01687b4da0a7d8c4913a027797\">In our example, the null and alternative hypotheses would then state:<\/p>\n<p id=\"b16aa766db3547b2958378478bc27bd1\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: There is no relationship between gender and drunk driving.<\/p>\n<p id=\"eedcff2c6d52451db8da3b2d65f1b11c\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: There is a relationship between gender and drunk driving.<\/p>\n<p id=\"d89a0dcb1b3f404c95358ee3451b945d\">Or equivalently,<\/p>\n<p id=\"af9bc2dfd74949e482bbd22effec725c\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub>: Drunk driving and gender are independent<\/p>\n<p id=\"befc6bf1e0e04b7784834b758e73c91d\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub>: Drunk driving and gender are not independent<\/p>\n<p id=\"f27562f5f81044a7bd27528700f747ac\">and hence the name \u201cchi-square test for independence.\u201d<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"d7b9768cc9914c5da6227a08fba3e30c\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"cedff90d834b4b34a70dd2e180c1cda1\">Algebraically, independence between gender and driving drunk is equivalent to having equal proportions who drank (or did not drink) for males vs. females. In fact, the null and alternative hypotheses could have been re-formulated as<\/p>\n<p id=\"e78e856b900547cc8aa7e3643d3bee81\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0proportion of male drunk drivers = proportion of female drunk drivers<\/p>\n<p id=\"b82a15c380de481d8308f89974ceb8f7\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:\u00a0<\/em>proportion of male drunk drivers \u2260 proportion of female drunk drivers<\/p>\n<p id=\"c223590fbef1471082a205a760fe152c\">However, expressing the hypotheses in terms of proportions works well and is quite intuitive for two-by-two tables, but the formulation becomes very cumbersome when at least one of the variables has several possible values, not just two. We are therefore going to always stick with the \u201cwordy\u201d form of the hypotheses presented in step 1 above.<\/p>\n<\/div>\n<\/div>\n<div id=\"e9f0766b5b754c468974f4918b58ef10\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Idea of the Chi-Square Test<\/span><\/h2>\n<p id=\"d2403ca399c647178d5baa78e8e0cf06\">The idea behind the chi-square test, much like previous tests that we\u2019ve introduced, is to measure how far the data are from what is claimed in the null hypothesis. The further the data are from the null hypothesis, the more evidence the data presents against it. We\u2019ll use our data to develop this idea. Our data are represented by the observed counts:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a9ed2a49426d4bc2adcd61accfb079c7\" class=\"img-responsive popimg aligncenter\" title=\"The two-way table with counts. The cells which are not in a Total row or column are the observed counts. Full description: A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619; The observed counts are Male, Yes; Male, No; Female, Yes; Female, No;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image131.gif\" alt=\"The two-way table with counts. The cells which are not in a Total row or column are the observed counts. Full description: A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619; The observed counts are Male, Yes; Male, No; Female, Yes; Female, No;\" \/><\/span><\/span><\/p>\n<p id=\"ab0c2a81916847e0beffbcb86e21c27a\">How will we represent the null hypothesis?<\/p>\n<p id=\"c4607985bb2f460991e69cdf829434ab\">In the previous tests we introduced, the null hypothesis was represented by the null value. Here there is not really a null value, but rather a claim that the two categorical variables (drunk driving and gender, in this case) are independent.<\/p>\n<p id=\"d44f4499c0fd4853a048bd889cf5bbe4\">To represent the null hypothesis, we will calculate another set of counts \u2014 the counts that we would expect to see (instead of the observed ones) if drunk driving and gender were really independent (i.e., if H<sub>o<\/sub>\u00a0were true). For example, we actually observed 77 males who drove drunk; if drunk driving and gender were indeed independent (if H<sub>o<\/sub>\u00a0were true), how many male drunk drivers would we expect to see instead of 77? Similarly, we can ask the same kind of question about (and calculate) the other three cells in our table.<\/p>\n<p id=\"aaf3052ee6e44bb48ea7426a3049e034\">In other words, we will have two sets of counts:<\/p>\n<ul id=\"b8411479b51244e4b36aeeb3928ddd8f\">\n<li>\n<p id=\"c94a44bc2fee48799ace6e67f2e48a71\">the observed counts (the data)<\/p>\n<\/li>\n<li>\n<p id=\"c63e4db632f0446c8d77a4d2a0bfbb2d\">the expected counts (if H<sub>o<\/sub>\u00a0were true)<\/p>\n<\/li>\n<\/ul>\n<p id=\"b4c97e60335348dba00fbbe2200d0d14\">We will measure how far the observed counts are from the expected ones. Ultimately, we will base our decision on the size of the discrepancy between what we observed and what we would expect to observe if H<sub>o<\/sub>\u00a0were true.<\/p>\n<p id=\"c0191e125b144c2fbe753ccd391031ab\">How are the expected counts calculated? Once again, we are in need of probability results. Recall from the probability section that if events A and B are independent, then P(A and B) = P(A) * P(B). We use this rule for calculating expected counts, one cell at a time.<\/p>\n<p id=\"f3e682632f0d402d9dd675658f9fef1d\">Here again are the observed counts:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"da94bd7714d94826898ce112ebc4b2de\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image132.gif\" alt=\"A two-way table, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span><\/p>\n<p id=\"cd79f15275204594b5e3aaaa8e8677a5\">Applying the rule to the first (top left) cell, if driving drunk and gender were independent then:<\/p>\n<p id=\"de9e5605242544a8a07726814c525122\">P(drunk and male) = P(drunk) * P(male)<\/p>\n<p id=\"a4231dd0692a4d1d986eee775a326856\">By dividing the counts in our table, we see that:<\/p>\n<p id=\"f8bed347190a44fea3da273f9ba931e1\">P(Drunk) = 93 \/ 619 and<\/p>\n<p id=\"b3a32553357d495ab1ae7f8795c493f8\">P(Male) = 481 \/ 619,<\/p>\n<p id=\"d49f3665c13544c79710a57e85fd5f8c\">and so,<\/p>\n<p id=\"abf7f8795f77447da39cdb3a2a0a4bff\">P(Drunk and Male) = (93 \/ 619) (481 \/ 619)<\/p>\n<p id=\"e095f87aadc14e8195b30aee6d3bd814\">Therefore, since there are total of 619 drivers,\u00a0<em class=\"italic\">if drunk driving and gender were independent<\/em>, the\u00a0<em class=\"italic\">count\u00a0<\/em>of drunk male drivers that I would\u00a0<em class=\"italic\">expect<\/em>\u00a0to see is:<\/p>\n<p>[latex]619*P(Drunk\\ and\\ Male)=619\\left ( \\frac{93}{619} \\right )\\left ( \\frac{481}{619} \\right )=\\frac{93*481}{619}[\/latex]<\/p>\n<p id=\"eb2622b262c34ca4821260cd59170b24\">Notice that this expression is the product of the column and row totals for that particular cell, divided by the overall table total.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a2dea1be38dc435dac7704466d99773c\" class=\"img-responsive popimg aligncenter\" title=\"P(Drunk and Male) is calculated using 3 cells from the two-way table. These are the row total (Male, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Male) = (column total * row total)\/(table total)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image133.gif\" alt=\"P(Drunk and Male) is calculated using 3 cells from the two-way table. These are the row total (Male, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Male) = (column total * row total)\/(table total)\" \/><\/span><\/span><\/p>\n<p id=\"ae705ae0a37f4200a9abfdb7c2512d92\">Similarly, if the variables are independent,<\/p>\n<p id=\"f57cb8e5a3cc409092ef4b66d64b70e5\">P(Drunk and Female) = P(Drunk) * P(Female) = (93 \/ 619) (138 \/ 619)<\/p>\n<p id=\"f35f1547464c43f5bdc06cca827ba492\">and the expected count of females driving drunk would be<\/p>\n<p>[latex]\\left(\\frac{93}{619}\\right)\\left(\\frac{138}{619}\\right)=\\frac{93\\ast138}{619}[\/latex]<\/p>\n<p id=\"de35a57a4a234e34881825fbfa821daf\">Again, the expected count equals the product of the corresponding column and row totals, divided by the overall table total:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b7348a857c7146a19c6d99031a16858b\" class=\"img-responsive popimg aligncenter\" title=\"P(Drunk and Female) is calculated using 3 cells from the two-way table. These are the row total (Female, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Female) = (column total * row total)\/(table total)\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image134.gif\" alt=\"P(Drunk and Female) is calculated using 3 cells from the two-way table. These are the row total (Female, Total) cell, the column total (Total, Yes) cell, and table total (Total, Total) cell. P(Drunk and Female) = (column total * row total)\/(table total)\" \/><\/span><\/span><\/p>\n<p id=\"f21c1499ae9e471080c59c2e2b52421b\">This will always be the case, and will help streamline our calculations:<\/p>\n<p>[latex]Expected\\ Count=\\frac{Column\\ total\\ \\ast Row\\ total}{Table\\ total}[\/latex]<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-236\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-236\" class=\"h5p-iframe\" data-content-id=\"236\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"f79d31dc9b1d40dba866091388c943c6\">Here is the complete table of expected counts, followed by the table of observed counts:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d3cbcfa8767d4ad293d59bbe7d8fb41e\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: (93 * 481)\/619 = 72.3; Male, No: (526 * 481)\/619 = 408.7; Male, Total: 481; Female, Yes: (93 * 138)\/619 = 20.7; Female, No: (526 * 138)\/619 = 117.3; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image136.gif\" alt=\"A two-way table for expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: (93 * 481)\/619 = 72.3; Male, No: (526 * 481)\/619 = 408.7; Male, Total: 481; Female, Yes: (93 * 138)\/619 = 20.7; Female, No: (526 * 138)\/619 = 117.3; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e12bc80c342c4aa6820a38be0eaa2104\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image137.gif\" alt=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 77; Male, No: 404; Male, Total: 481; Female, Yes: 16; Female, No: 122; Female, Total: 138; Total, Yes: 93; Total, No: 526; Total, Total: 619;\" \/><\/span><\/span><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"efdf8424cceb416f96350da66fb147ea\">A study was done on the relationship between gender and piercing among high-school students. A sample of 1,000 students was chosen, then classified according to gender and according to whether or not they had any of their ears pierced. The results of the study are summarized in the following 2-by-2 table:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f70131904009407ea2e16bb6bf1ecdf3\" class=\"img-responsive popimg aligncenter\" title=\"A two way table with &amp;quot;Yes Piercing,&amp;quot; &amp;quot;No Pierceing,&amp;quot; and &amp;quot;Total&amp;quot; columns. The rows are &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Female, Yes: 576; Female, No: 64; Female, Total: 640; Male, Yes: 72; Male, No: 288; Male, Total: 360; Total, Yes: 648; Total, No: 352; Total, Total: 1000\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image425.gif\" alt=\"A two way table with &amp;quot;Yes Piercing,&amp;quot; &amp;quot;No Pierceing,&amp;quot; and &amp;quot;Total&amp;quot; columns. The rows are &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Female, Yes: 576; Female, No: 64; Female, Total: 640; Male, Yes: 72; Male, No: 288; Male, Total: 360; Total, Yes: 648; Total, No: 352; Total, Total: 1000\" \/><\/span><\/span><\/p>\n<div id=\"h5p-237\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-237\" class=\"h5p-iframe\" data-content-id=\"237\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Did I get this 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>We see that there are differences between the observed and expected counts in the respective cells. We now have to come up with a measure that will quantify these differences. This is the chi-square test statistic.<\/p>\n<div id=\"e162a5169a8843cb838035910712238a\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Step 2: Checking the Conditions and Calculating the Test Statistic<\/span><\/h2>\n<p id=\"b3928ee4160d456b9d122818b36d0362\">Given our discussion on the previous page, it would be natural to present the test statistic, and then come back to the conditions that allow us to safely use the chi-square test, although in practice this is done the other way around.<\/p>\n<p id=\"fba287d452d645269a1f37e16993b3d6\">The single number that summarizes the overall difference between observed and expected counts is the chi-square statistic\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0, which tells us in a standardized way how far what we observed (data) is from what would be expected if H<sub>o<\/sub>\u00a0were true.<\/p>\n<p id=\"fed43793f07a420585addca3e1230903\">Here it is:<\/p>\n<p>[latex]\\mathcal{X}^2=\\sum_{all\\ cells}\\frac{\\left(Observed\\ Count-Expected\\ Count\\right)^2}{Expected\\ Count}[\/latex]<\/p>\n<\/div>\n<\/div>\n<div id=\"c95ab22af4714b96bb076cfb24d83f7e\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"b6ca4b0efe2543a8abb5c2286fbaaa30\">As we expected,\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0is based on each of the differences: observed count \u2013 expected count (one such difference for each cell), but why is it squared? Why do we divide each square difference by the expected count? The reason we do that is so that the null distribution of\u00a0<span id=\"MathJax-Element-4-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0will have a known null distribution (under which p-values can be easily calculated). The details are really beyond the scope of this course, but we will just say that the null distribution of\u00a0<span id=\"MathJax-Element-5-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><sup><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/sup><\/span><\/span><\/span><\/span>\u00a0is called chi-square (which is not very surprising given that the test is called the chi-square test), and like the t-distributions there are many chi-square distributions distinguished by the number of degrees of freedom associated with them.<\/p>\n<\/div>\n<\/div>\n<div id=\"ada8c09578ea4f02bf0555967836b33e\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Conditions under Which the Chi-Square Test Can Safely Be Used<\/span><\/h2>\n<ol id=\"c8346e9c11b34abcbe9214acd20480d8\" class=\"lower-roman\">\n<li>\n<p id=\"cb3502f65fec4046b8617164a922f111\">The sample should be random.<\/p>\n<\/li>\n<li>\n<p id=\"ad252e4f548c4392b98030d66f32765b\">In general, the larger the sample, the more accurate and reliable the test results are. There are different versions of what the conditions are that will ensure reliable use of the test, all of which involve the expected counts. One version of the conditions says that all expected counts need to be greater than 1, and at least 80% of expected counts need to be greater than 5. A more conservative version requires that all expected counts are larger than 5.<\/p>\n<\/li>\n<\/ol>\n<div id=\"c18a5785e83841ddb07ff5ff723d2fb7\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"ddcc2e8038f440d99acc6ca1095a2d19\">Here, again, are the observed and expected counts.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d7eca2291f9a4a99bd20f569106649f8\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed and expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: observed: 77 expected: 72.3 Male, No: observed: 404 expected: 408.7 Male, Total: 481 Female, Yes: observed: 16, expected: 20.7 Female, No: observed: 122, expected: 117.3 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image141.gif\" alt=\"A two-way table for observed and expected counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: observed: 77 expected: 72.3 Male, No: observed: 404 expected: 408.7 Male, Total: 481 Female, Yes: observed: 16, expected: 20.7 Female, No: observed: 122, expected: 117.3 Female, Total: 138 Total, Yes: 93, Total, No: 526 Total, Total: 619\" \/><\/span><\/span><\/p>\n<p id=\"f4fb35ef4c664e75a46683c260cc3f70\">Checking the conditions:<\/p>\n<ol id=\"a3602a79a8c14bb5a756bc72ef8bcf53\" class=\"lower-roman\">\n<li>\n<p id=\"a159a94029244bb7b5ef81bce0c62adc\">The roadside survey is known to have been random.<\/p>\n<\/li>\n<li>\n<p id=\"af49206e0cea479ea88d8d42e24bdc26\">All the expected counts are above 5.<\/p>\n<p id=\"c1f425821c8d4e8598c67598de39bda9\">We can therefore safely proceed with the chi-square test, and the chi-square test statistic is:<\/p>\n<p>[latex]\\frac{\\left(77-72.3\\right)^2}{72.3}+\\frac{\\left(404-408.7\\right)^2}{408.7}+\\frac{\\left(16-20.7\\right)^2}{20.7}+\\frac{\\left(122-117.3\\right)^2}{117.3}=.306+.054+1.067+.188=1.62[\/latex]<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"f5a14d97bb9d4451aaddf0055b93c5eb\">A study was done on the relationship between gender and piercing among high-school students. A sample of 1,000 students was chosen, and then classified according to both gender and whether or not they had either of their ears pierced. The following (edited) StatCrunch output is available:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cd33a0db5e71492d93889494c63e8c34\" class=\"img-responsive popimg aligncenter\" title=\"Female, Pierced: Count: 576, Expected Count: 414.7 Female, No Pierced: Count: 64, Expected Count: 225.3 Female, Total: 640 Male, Pierced: Count: 72, Expected Count: 233.3 Male, No Pierced: Count: 288, Expected Count: 126.7 Male, Total: 360 Total, Pierced: 648, No Pierced: 252, Total: 1000 Chi-Squared test: Chi-Square: DF = 1, P-value &amp;lt; 0.0001\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image426_statcrunch.gif\" alt=\"Female, Pierced: Count: 576, Expected Count: 414.7 Female, No Pierced: Count: 64, Expected Count: 225.3 Female, Total: 640 Male, Pierced: Count: 72, Expected Count: 233.3 Male, No Pierced: Count: 288, Expected Count: 126.7 Male, Total: 360 Total, Pierced: 648, No Pierced: 252, Total: 1000 Chi-Squared test: Chi-Square: DF = 1, P-value &amp;lt; 0.0001\" \/><\/span><\/span><\/p>\n<div id=\"h5p-238\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-238\" class=\"h5p-iframe\" data-content-id=\"238\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Did I get this 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"cede765c8b67485fa95c8e087069349f\">Once the chi-square statistic has been calculated, we can get a feel for its size: is there a relatively large difference between what we observed and what the null hypothesis claims, or a relatively small one? It turns out that for a 2-by-2 case like ours, we are inclined to call the chi-square statistic \u201clarge\u201d if it is larger than 3.84. Therefore, our test statistic is not large, indicating that the data are not different enough from the null hypothesis for us to reject it (we will also see that in the p-value not being small). For other cases (other than 2-by-2) there are different cut-offs for what is considered large, which are determined by the null distribution in that case. We are therefore going to rely only on the p-value to draw our conclusions. Even though we cannot really use the chi-square statistic, it was important to learn about it, since it encompasses the idea behind the test.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"fd5b03c1d91c42d7a8097e35c1c740ca\">The purpose of this activity is to continue to explore whether the risk of alcohol problems among New York firefighters and first responders is related to participation in the 911 rescue. In particular, in this activity, we will state the hypotheses that are being tested, learn how to carry out the chi-square test for independence using statistical software, and check whether the conditions under which this test can be safely used are met.<\/p>\n<table id=\"c8af3840295240c9abbd714bcbbfbeff_bx\" class=\"table labeled aligncenter\">\n<thead>\n<tr>\n<th>Observed data<\/th>\n<\/tr>\n<\/thead>\n<tfoot>\n<tr>\n<td class=\"captionwrap\">\n<p id=\"ac915985ae7244abd960f688d89b42a2c\">New York Firefighters and first responders<\/p>\n<p>&nbsp;<\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>\n<table id=\"c8af3840295240c9abbd714bcbbfbeff\" class=\"grid\" style=\"border-spacing: 0px; margin: auto;\">\n<thead>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa58eda83f9384db2acd68a8369673b77\">\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ac53d63ee30d1452eb5c4e278a124b609\">No risk for alchohol problems<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ad923c5f12ed64bf5b0706364c0561160\">Moderate to Servere risk for alcohol problems<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ade73d6d5bdbf46beb43f1b20535f9768\">Total<\/p>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"afe74a1729df44a0b9876716a4a0f0853\">Participated in 911 rescue<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab1d470c99d8a42b7a210ca9e0641bbd7\">793<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab8715b0986a842daac7ea8cc6c4e2712\">309<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aadbb1fd1050487c82535db0dccd08fe\"><em class=\"bold\">1102<\/em><\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aaffbdb3df1714e75b78331c529d4ad1a\">Did Not Participate in 911 rescue<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"af54b9e47176b49e5bae9ae96b6df8cd6\">441<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ac5d0957e734b45e3bbf3ba068060dba5\">110<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"f39e3227b2da4ab7a1cc4c57e308da5a\"><em class=\"bold\">551<\/em><\/p>\n<\/td>\n<\/tr>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa8b2d1d8fe64412b98ed4aa93059e721\">Total<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"a939da9ff1684082b1738c73b4e651f2\"><em class=\"bold\">1234<\/em><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ffe554203f9a40dc903016571d23b191\"><em class=\"bold\">419<\/em><\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"f20fdcc9748048aab2f318b3accff063\"><em class=\"bold\">1653<\/em><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div id=\"h5p-239\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-239\" class=\"h5p-iframe\" data-content-id=\"239\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Did I get this 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"f91f62332b614080950906b1d57158a1\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Step 3: Finding the p-value<\/span><\/h2>\n<p id=\"a44afcc3857f481db030c8b010e9c86c\">The p-value for the chi-square test for independence is the probability of getting counts like those observed, assuming that the two variables are not related (which is what is claimed by the null hypothesis). The smaller the p-value, the more surprising it would be to get counts like we did, if the null hypothesis were true.<\/p>\n<p id=\"a04dca1d85ef42169afa93129941aaf1\">Technically, the p-value is the probability of observing\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-msup\"><span class=\"mjx-base\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">\u03c7<\/span><\/span><\/span><span class=\"mjx-sup\"><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">2<\/span><\/span><\/span><\/span><\/span><\/span><\/span>\u00a0at least as large as the one observed. Using statistical software, we find that the p-value for this test is 0.201.<\/p>\n<\/div>\n<\/div>\n<div id=\"b34795ce63b54ab39053342641fe1d07\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Step 4: Stating the conclusion in context<\/span><\/h2>\n<p id=\"da1e79641035444989006d0c54fde0e6\">As usual, we use the magnitude of the p-value to draw our conclusions. A small p-value indicates that the evidence provided by the data is strong enough to reject H<sub>o<\/sub>\u00a0and conclude (beyond a reasonable doubt) that the two variables are related. In particular, if a significance level of .05 is used, we will reject H<sub>o<\/sub>\u00a0if the p-value is less than .05.<\/p>\n<\/div>\n<\/div>\n<div id=\"d6335e343e574ee594f3c01e55979d0d\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"e0423d5b75ef44a4863d62d6ace6010e\">A p-value of .201 is not small at all. There is no compelling statistical evidence to reject H<sub>o<\/sub>, and so we will continue to assume it may be true. Gender and drunk driving may be independent, and so the data suggest that a law that forbids sale of 3.2% beer to males and permits it to females is unwarranted. In fact, the Supreme Court, by a 7-2 majority, struck down the Oklahoma law as discriminatory and unjustified. In the majority opinion Justice Brennan wrote (http:\/\/www.law.umkc.edu\/faculty\/projects\/ftrials\/conlaw\/craig.html):<\/p>\n<p id=\"ea29bc5c9d5d4a4ea8a91354f4157e76\">\u201cClearly, the protection of public health and safety represents an important function of state and local governments. However, appellees\u2019 statistics in our view cannot support the conclusion that the gender-based distinction closely serves to achieve that objective and therefore the distinction cannot under [prior case law] withstand equal protection challenge.\u201d<\/p>\n<\/div>\n<\/div>\n<p id=\"a8abf946ba644b16856b3c4390d6858e\">The purpose of this activity is to draw our conclusion regarding the relationship between participation in the 9\/11 rescue and risk of alcohol problems among New York firefighters and first responders.<\/p>\n<div id=\"c9a38f2fc30242fb9dca21c65474c59a\" class=\"pulloutwrap\">\n<div class=\"pullout clearfix\">\n<div class=\"Excel2019PC altContentOn\">\n<div class=\"alternative\">\n<p id=\"e484671afdc44a849f0175283379c0b0\">In the previous activity, we created a table of expected counts to go along with our table of observed counts. In this activity, we will use both tables to conduct a chi-square test on the data.<\/p>\n<p id=\"c5d19a5a811647dba659121698623119\">To do this in Excel, we first need to re-create both the table of observed counts and table of expected counts from the last exercise. Here are the data again for your convenience:<\/p>\n<p id=\"f6ed578026dd4da490cc4940bf36455d\">In the previous activity, we carried out the chi-square test using StatCrunch and obtained the following output:<\/p>\n<p id=\"a405a602898e41309a22b1151446ab84\"><em class=\"italic\">Contingency table results:<\/em><\/p>\n<p id=\"ec92782b5dd2466ca216b4065cde236b\">Rows: 911<\/p>\n<p id=\"e71c786cabb349c89e16a89a5a7f1449\">Columns: None<\/p>\n<p id=\"af41b1143020d4fd086fbbaff81ddccbf\"><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bb426694710740d0b832561b9b6191b4\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/inline_assessment\/lbd012_statcrunch.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-240\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-240\" class=\"h5p-iframe\" data-content-id=\"240\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"11.2 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"eb918e73eb0d4f86adaae75f47f7148b\">This is a good opportunity to illustrate an important idea that was discussed earlier in this unit: The larger the sample the results are based on, the more evidence they carry. Let\u2019s take the previous example and simply multiply each of the counts by 3:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bba7dd41c6654553a3333e20a1bf6fa3\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; (categories of Drank Alcohol in the last 2 hours?) and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231; Male, No: 1212; Male, Total: 1443; Female, Yes: 48; Female, No: 366; Female, Total: 414; Total, Yes: 279; Total, No: 1578; Total, Total: 1875;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image144.gif\" alt=\"A two-way table for observed counts, in which the columns are labeled &amp;quot;Yes,&amp;quot; &amp;quot;No,&amp;quot; (categories of Drank Alcohol in the last 2 hours?) and &amp;quot;Total.&amp;quot; The rows are labeled &amp;quot;Male,&amp;quot; &amp;quot;Female,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231; Male, No: 1212; Male, Total: 1443; Female, Yes: 48; Female, No: 366; Female, Total: 414; Total, Yes: 279; Total, No: 1578; Total, Total: 1875;\" \/><\/span><\/span><\/p>\n<p id=\"f4c3f6c3c4ab49ba9c83473eca82c052\">and see what would have happened if these were the original data. Obviously, the conditional counts would remain the same:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e0960f17817c4b15a13ba9187b342168\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table for conditional counts, in which the columns are labeled &amp;quot;Yes&amp;quot; and &amp;quot;No&amp;quot; (categories of Drank Alcohol in the last 2 hours?). The rows are labeled &amp;quot;Male&amp;quot; and &amp;quot;Female.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231\/1443 = 16.0% Male, No: 1212\/1443 = 84.0% Female, Yes: 48\/414 = 11.6% Female, No: 366\/414 = 88.4%\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image145.gif\" alt=\"A two-way table for conditional counts, in which the columns are labeled &amp;quot;Yes&amp;quot; and &amp;quot;No&amp;quot; (categories of Drank Alcohol in the last 2 hours?). The rows are labeled &amp;quot;Male&amp;quot; and &amp;quot;Female.&amp;quot; Here is the data in the table, given in cell format (&amp;quot;Row, Column: Value&amp;quot;): Male, Yes: 231\/1443 = 16.0% Male, No: 1212\/1443 = 84.0% Female, Yes: 48\/414 = 11.6% Female, No: 366\/414 = 88.4%\" \/><\/span><\/span><\/p>\n<p id=\"ba34df6be90341948068618ca2f5eb6e\">In other words, the sample provides the \u201csame\u201d results, but this time they are based on a much larger sample (1857 instead of 619). This is reflected by the chi-square test. In this case, software gives us a chi-square statistic of 4.910 and a p-value of 0.027.<\/p>\n<p id=\"fc74c261239649239816ba89b0fc971e\">As before, H<sub>o<\/sub>\u00a0states that gender and drunk driving are not related; H<sub>a<\/sub>\u00a0states that they are related. Since the observed counts are triple what they were before, the expected counts are also tripled. When done with software the original chi-square statistic was 1.637 since software doesn\u2019t round as much. The chi-square statistic when we tripled the data is 3 times 1.637, or 4.91 (which now is in the \u201clarge\u201d range). Therefore, the p-value is smaller and is now .027.<\/p>\n<p id=\"a9bb0ad87e214026874b78e59566b937\">Now, we do reject H<sub>o<\/sub>, and we conclude that gender and drunk driving are related. In this case, the \u201clargest contribution to chi-square\u201d is large enough to provide evidence of a relationship. This is due to the fact that so few females drove drunk (48) compared to the number that would be expected (62.2, which is 414 * 279 \/ 1857) if the variables gender and drunk driving were not related. This contribution is\u00a0[latex]\\frac{\\left(48-62.2\\right)^2}{62.2}=3.242[\/latex].<\/p>\n<p id=\"dca01463b5b944928c4fd12215716cea\">Let\u2019s look at another example.<\/p>\n<div id=\"bd5884a1d2d0443cb77e91f114d4c4de\" class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>Steroid Use in Sports<\/h4>\n<div>\n<p id=\"becad8787f9a44b79200166f4efdfe19\">Major-league baseball star Barry Bonds admitted to using a steroid cream during the 2003 season. Is steroid use different in baseball than in other sports? According to the 2001 National Collegiate Athletic Association (NCAA) survey (http:\/\/www.ncaa.org\/library\/research\/substance_use_habits\/2001\/substance_use_habits.pdf), which is self-reported and asked of a stratified random selection of teams from each of the three NCAA divisions, reported steroid use among the top 5 college sports was as follows:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bcf63532b809450cab495a119a853cff\" class=\"img-responsive popimg aligncenter\" title=\"A two-way table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 26; Baseball, Reported Not Using: 1088; Baseball, Total: 1114; Basketball, Reported Using: 13; Basketball, Reported Not Using: 881; Basketball, Total: 894; Football, Reported Using: 59; Football, Reported Not Using: 1897; Football, Total: 1956; Tennis, Reported Using: 2; Tennis, Reported Not Using: 335; Tennis, Total: 337; Track\/Field, Reported Using: 6; Track\/Field, Reported Not Using: 486; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image188.gif\" alt=\"A two-way table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 26; Baseball, Reported Not Using: 1088; Baseball, Total: 1114; Basketball, Reported Using: 13; Basketball, Reported Not Using: 881; Basketball, Total: 894; Football, Reported Using: 59; Football, Reported Not Using: 1897; Football, Total: 1956; Tennis, Reported Using: 2; Tennis, Reported Not Using: 335; Tennis, Total: 337; Track\/Field, Reported Using: 6; Track\/Field, Reported Not Using: 486; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" \/><\/span><\/span><\/p>\n<p id=\"b2418376aaa54848a029766bfe61ff6d\">Do the data provide evidence of a significant relationship between steroid use and the type of sport? In other words, are there significant differences in steroid use among the different sports?<\/p>\n<p id=\"a38359b7a237487885ea87ab24174ca0\">Before we carry out the chi-square test for independence, let\u2019s get a sense of the data by calculating the conditional percents:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cb180d9913a9473a98641f1720666ff8\" class=\"img-responsive popimg aligncenter\" title=\"A two-way conditional percent table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 2.3% Baseball, Reported Not Using: 97.7%; Baseball, Total: 1114; Basketball, Reported Using: 1.5%; Basketball, Reported Not Using: 98.5%; Basketball, Total: 894; Football, Reported Using: 3%; Football, Reported Not Using: 97%; Football, Total: 1956; Tennis, Reported Using: .6%; Tennis, Reported Not Using: 99.4%; Tennis, Total: 337; Track\/Field, Reported Using: 1.2%; Track\/Field, Reported Not Using: 98.8%; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image189.gif\" alt=\"A two-way conditional percent table which has columns labeled &amp;quot;Reported Using Steroids,&amp;quot; &amp;quot;Reported Not Using Steroids,&amp;quot; and &amp;quot;Total.&amp;quot; The row labels are: &amp;quot;Men&amp;apos;s Baseball,&amp;quot; &amp;quot;Men&amp;apos;s Basketball,&amp;quot; &amp;quot;Men&amp;apos;s Football,&amp;quot; &amp;quot;Men&amp;apos;s Tennis,&amp;quot; &amp;quot;Men&amp;apos;s track\/field,&amp;quot; and &amp;quot;Total.&amp;quot; Here is the data in cell format (Row, Column: Value): Baseball, Reported Using: 2.3% Baseball, Reported Not Using: 97.7%; Baseball, Total: 1114; Basketball, Reported Using: 1.5%; Basketball, Reported Not Using: 98.5%; Basketball, Total: 894; Football, Reported Using: 3%; Football, Reported Not Using: 97%; Football, Total: 1956; Tennis, Reported Using: .6%; Tennis, Reported Not Using: 99.4%; Tennis, Total: 337; Track\/Field, Reported Using: 1.2%; Track\/Field, Reported Not Using: 98.8%; Track\/Field, Total: 492; Total, Reported Using: 106 Total, Reported Not Using, 4687; Total, Total: 4782;\" \/><\/span><\/span><\/p>\n<p id=\"b6d756bc072a48ed8016b0b00069189c\">It seems as if there are differences in steroid use among the different sports. Even though the differences do not seem to be overwhelming, since the sample size is so large, these differences might be significant. Let\u2019s carry out the test and see.<\/p>\n<p id=\"d3ffb1060fc94f8eb92f43776857e2ba\"><em class=\"italic\">Step 1: Stating the hypotheses<\/em><\/p>\n<p id=\"e82d5fc9515249028fec93b83a328369\">The hypotheses are:<\/p>\n<p id=\"b4df6e71ca95400a9f325ca86bde9f20\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">o<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0steroid use is not related to the type of sport (or: type of sport and steroid use are independent)<\/p>\n<p id=\"de46534ed2af462a9a9c734dcb99a9d3\"><em class=\"italic\">H<\/em><sub><em class=\"italic\">a<\/em><\/sub><em class=\"italic\">:<\/em>\u00a0Steroid use is related to the type of sport (or: type of sport and steroid use are not independent).<\/p>\n<p id=\"acbd9fdea65e478286a6fca3c5b16cbc\"><em class=\"italic\">Step 2: Checking conditions and finding the test statistic<\/em><\/p>\n<p id=\"d246d8faaec7441ab5e345bda586d524\">Here is the Minitab output of the chi-square test for this example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d326d66b43fb4b1e8720b0f1f4ad7861\" class=\"img-responsive popimg aligncenter\" title=\"Chi-Square Test: mem used, men not used. Baseball: men used: Observed: 26, Expected: 24.4, Chi-Square contribution: 0.075; Baseball: men not used: Observed: 1088, Expected: 1089.36, Chi-Square contribution: 0.002; Baseball: Total: 1114; Basketball: men used: Observed: 13, Expected: 19.77, Chi-Square contribution: 2.319; Basketball: men not used: Observed: 881, Expected: 874.23, Chi-Square contribution: 0.052; Basketball: Total: 894; Football: men used: Observed: 59, Expected: 43.6, Chi-Square contribution: 5.729; Football: men not used: Observed: 1879, Expected: 1912.74, Chi-Square contribution: 0.130; Football: Total: 1956 Tennis: men used: Observed: 2, Expected: 7.45, Chi-Square contribution: 3.990; Tennis: men not used: Observed: 335, Expected: 329.55, Chi-Square contribution: 0.090; Tennis: Total: 337; track\/field: men used: Observed: 6, Expected: 10.88, Chi-Square contribution: 2.189; track\/field: men not used: Observed: 486, Expected: 481.22, Chi-Square contribution: 0.050; track\/field: Total: 492; Total: Men used: 106, Men not Used: 4689, Total: 4793; Chi-Sq = 14.626, DF = 4, P-Value = 0.006\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image190.gif\" alt=\"Chi-Square Test: mem used, men not used. Baseball: men used: Observed: 26, Expected: 24.4, Chi-Square contribution: 0.075; Baseball: men not used: Observed: 1088, Expected: 1089.36, Chi-Square contribution: 0.002; Baseball: Total: 1114; Basketball: men used: Observed: 13, Expected: 19.77, Chi-Square contribution: 2.319; Basketball: men not used: Observed: 881, Expected: 874.23, Chi-Square contribution: 0.052; Basketball: Total: 894; Football: men used: Observed: 59, Expected: 43.6, Chi-Square contribution: 5.729; Football: men not used: Observed: 1879, Expected: 1912.74, Chi-Square contribution: 0.130; Football: Total: 1956 Tennis: men used: Observed: 2, Expected: 7.45, Chi-Square contribution: 3.990; Tennis: men not used: Observed: 335, Expected: 329.55, Chi-Square contribution: 0.090; Tennis: Total: 337; track\/field: men used: Observed: 6, Expected: 10.88, Chi-Square contribution: 2.189; track\/field: men not used: Observed: 486, Expected: 481.22, Chi-Square contribution: 0.050; track\/field: Total: 492; Total: Men used: 106, Men not Used: 4689, Total: 4793; Chi-Sq = 14.626, DF = 4, P-Value = 0.006\" \/><\/span><\/span><\/p>\n<ul id=\"ab26561d131141a896cb364787c91901\" class=\"none\">\n<li>\n<p id=\"e07c152be16a4c1487d479a1a7cce031\">Conditions:<\/p>\n<ol id=\"d845e42a027f4d78894d5f5b29bb6a7e\" class=\"lower-roman\">\n<li>\n<p id=\"ee1916d236bf4fd8bfb7426d803c4835\">We are told that the sample was random.<\/p>\n<\/li>\n<li>\n<p id=\"b954a2e9ce85494db812e7f452494570\">All the expected counts are above 5.<\/p>\n<\/li>\n<\/ol>\n<\/li>\n<li>\n<p id=\"a3eabd87533843f08ea9659e57929d6f\">Test statistic:<\/p>\n<p id=\"fb766867822a442cb5bf845621a494d6\">The test statistic is 14.626. Note that the \u201clargest contributors\u201d to the test statistic are 5.729 and 3.990. The first cell corresponds to football players who used steroids, with an observed count larger than we would expect to see under independence. The second cell corresponds to tennis players who used steroids, and has an observed count lower than we would expect under independence.<\/p>\n<\/li>\n<\/ul>\n<p id=\"c34478b2e49f4f44a4be5fb92d5a6f5e\"><em class=\"italic\">Step 3: Finding the p-value<\/em><\/p>\n<p id=\"b764be5ae3174b21853cb95b5454bf20\">According to the output p-value it would be extremely unlikely (probability of 0.006) to get counts like those observed if the null hypothesis were true. In other words, it would be very surprising to get data like those observed if steroid use were not related to sport type.<\/p>\n<p id=\"ef3640dc082e45638035dc45b7776433\"><em class=\"italic\">Step 4: Conclusion<\/em><\/p>\n<p id=\"c381cacca54a4aadb51a87b21b8c3f7d\">The small p-value indicates that the data provide strong evidence against the null hypothesis, so we reject it and conclude that the steroid use is related to the type of sport.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"b98a6983a7944948b594f6813adb87be\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<ul id=\"f34f5e5297c846dd8f7a8f0be74f9569\">\n<li>\n<p id=\"c5bcd91a752347579d8176e7b7210198\">The chi-square test for independence is used to test whether the relationship between two categorical variables is significant. In other words, the chi-square procedure assesses whether the data provide enough evidence that a true relationship between the two variables exists in the population.<\/p>\n<\/li>\n<li>\n<p id=\"d11ff31417854a89bee0602e25d2af94\">The hypotheses that are being tested in the chi-square test for independence are:<\/p>\n<ul id=\"ac29df7dbd6c4009bc161dcd5b13b2f9\" class=\"none\">\n<li>\n<p id=\"deeabf81e60b49d7bef2b085f48b7e5f\">H<sub>o<\/sub>: There is no relationship between \u2026 and \u2026.<\/p>\n<\/li>\n<li>\n<p id=\"b3f19aff88014fa2b28e0815112096fc\">H<sub>a<\/sub>: There is a relationship between \u2026 and \u2026.<\/p>\n<p id=\"fbf5e517934e421e8be1e4aa8cb2c132\">\n<\/li>\n<li>\n<p id=\"b4d2b81d988c4a5cbeb877db54a2c74d\">or equivalently,<\/p>\n<p id=\"fbe794fae8d04ccfa8060d964a829aaf\">\n<\/li>\n<li>\n<p id=\"d9db036cb4f04804aacb0f45cac3e620\">H<sub>o<\/sub>: The variables \u2026 and \u2026 are independent.<\/p>\n<\/li>\n<li>\n<p id=\"d0bb5a4b3b8a46d08da23973181b9bdf\">H<sub>a<\/sub>: The variables \u2026 and \u2026 are not independent.<\/p>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<p id=\"bcaff40a3fb74bccbae50e79b919f933\">The idea behind the test is measuring how far the observed data are from the null hypothesis by comparing the observed counts to the expected counts\u2014the counts that we would expect to see (instead of the observed ones) had the null hypothesis been true. The expected count of each cell is calculated as follows:<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d8f101a0be7a497caa67e7c069b4b547\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image523f.png\" alt=\"\" \/><\/span><\/span><\/p>\n<p>&nbsp;<\/li>\n<li>\n<p id=\"a80996991b2d4e89b62aa910b27408e4\">The measure of the difference between the observed and expected counts is the chi-square test statistic, whose null distribution is called the chi-square distribution. The chi-square test statistic is calculated as follows:<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c3e73562f9ec4d51aaad9e9c05755f1d\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u5_inference\/_m2_inference_for_relationships\/webcontent\/image523g.png\" alt=\"\" \/><\/span><\/span><\/p>\n<p>&nbsp;<\/li>\n<li>\n<p id=\"e09cb57270364676bbf2ede5625654a6\">Once we verify that the conditions that allow us to safely use the chi-square test are met, we use software to carry it out and use the p-value to guide our conclusions.<\/p>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":18,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-577","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":421,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/577","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":13,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/577\/revisions"}],"predecessor-version":[{"id":1148,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/577\/revisions\/1148"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/421"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/577\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=577"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=577"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=577"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=577"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}