{"id":487,"date":"2024-10-18T02:01:07","date_gmt":"2024-10-18T02:01:07","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=487"},"modified":"2024-12-11T21:12:05","modified_gmt":"2024-12-11T21:12:05","slug":"3-3-correlation-coefficient","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/3-3-correlation-coefficient\/","title":{"raw":"3.3: Correlation Coefficient","rendered":"3.3: Correlation Coefficient"},"content":{"raw":"<div class=\"\">\r\n<h2>Introduction<\/h2>\r\n<\/div>\r\n<div class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10B07\">So far we have visualized relationships between two quantitative variables using scatterplots, and described the overall pattern of a relationship by considering its direction, form, and strength. We noted that assessing the strength of a relationship just by looking at the scatterplot is quite difficult, and therefore we need to supplement the scatterplot with some kind of numerical measure that will help us assess the strength.<\/p>\r\nIn this part, we will restrict our attention to the\u00a0<em>special case of relationships that have a linear form<\/em>, since they are quite common and relatively simple to detect. More importantly, there exists a numerical measure that assesses the strength of the linear relationship between two quantitative variables with which we can supplement the scatterplot. We will introduce this numerical measure here and discuss it in detail.\r\n<p id=\"N10B10\">Even though from this point on we are going to focus only on linear relationships, it is important to remember that\u00a0<em>not every relationship between two quantitative variables has a linear form.<\/em>\u00a0We have actually seen several examples of relationships that are not linear. The statistical tools that will be introduced here are\u00a0<em>appropriate only for examining linear relationships,<\/em>\u00a0and as we will see, when they are used in nonlinear situations, these tools can lead to errors in reasoning.<\/p>\r\nLet\u2019s start with a motivating example. Consider the following two scatterplots.\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"Two scatterplots, displaying the same data. However, the first scatterplot has a much larger scale on its axes than the second. Because of this, the first scatterplot has its data points clustered closer together than in the second scatterplot. Both have their points arranged roughly in a linear manner. In addition, the second scatterplot appears to have some outliers. All this is merely due to the scale change.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear1.gif\" alt=\"Two scatterplots, displaying the same data. However, the first scatterplot has a much larger scale on its axes than the second. Because of this, the first scatterplot has its data points clustered closer together than in the second scatterplot. Both have their points arranged roughly in a linear manner. In addition, the second scatterplot appears to have some outliers. All this is merely due to the scale change.\" \/><\/span><\/span>\r\n<p id=\"N10B22\">We can see that in both cases, the direction of the relationship is\u00a0<em>positive<\/em>\u00a0and the form of the relationship is\u00a0<em>linear<\/em>. What about the strength? Recall that the strength of a relationship is the extent to which the data follow its form.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"43\"]\r\n\r\n<\/div>\r\n<\/div>\r\nThe purpose of this example was to illustrate how assessing the strength of the linear relationship from a scatterplot alone is problematic, since our judgment might be affected by the scale on which the values are plotted. This example, therefore, provides a motivation for the\u00a0<em>need\u00a0<\/em>to supplement the scatterplot with a\u00a0<em>numerical measure<\/em>\u00a0that will\u00a0<em>measure the strength<\/em>\u00a0of the linear relationship between two quantitative variables.\r\n<div class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Correlation Coefficient\u2014r<\/span><\/h2>\r\n<p id=\"N10B06\">The numerical measure that assesses the strength of a linear relationship is called the\u00a0<em>correlation coefficient<\/em>\u00a0and is denoted by\u00a0<em>r<\/em>. We will<\/p>\r\n\r\n<ul>\r\n \t<li>Define the correlation r.<\/li>\r\n \t<li>Discuss the calculation of r.<\/li>\r\n \t<li>Explain how to interpret the value of r.<\/li>\r\n \t<li>Talk about some of the properties of r.<\/li>\r\n<\/ul>\r\n<em>Definition:\u00a0<\/em>The correlation coefficient (r) is a numerical measure that measures the\u00a0<em>strength<\/em>\u00a0and\u00a0<em>direction<\/em>\u00a0of a linear relationship between two quantitative variables.\r\n\r\n<em>Calculation:\u00a0<\/em>r is calculated using the following formula: [latex]\\mathcal{r}=\\frac{1}{\\mathcal{n}1}\\sum_{\\mathcal{i}=1}^{\\mathcal{n}}\\left(\\frac{\\mathcal{x}_\\mathcal{i}-\\bar{\\mathcal{x}}}{\\mathcal{S}_\\mathcal{x}}\\right)\\left(\\frac{\\mathcal{y}_\\mathcal{i}-\\bar{\\mathcal{y}}}{\\mathcal{S}_\\mathcal{x}}\\right)[\/latex]\r\n<p id=\"N10BD5\">However, the calculation of the correlation (r) is not the focus of this course. We will use a statistics package to calculate r for us, and the emphasis of this course is on the\u00a0<em class=\"italic\">interpretation<\/em>\u00a0of its value.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10BDE\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Interpretation<\/span><\/h2>\r\n<p id=\"N10BE5\">Once we obtain the value of r, its interpretation with respect to the strength of linear relationships is quite simple, as this walkthrough illustrates:<\/p>\r\nhttps:\/\/youtube.com\/watch?v=Bt-Ey2ebfvs\r\n<div class=\"figurewrap\">\r\n<div class=\"figure clearfix\">\r\n<div id=\"uwrap__i_0\" class=\"youtube\">\r\n\r\n<span style=\"text-align: initial; font-size: 1em;\">To get a better sense of how the value of r relates to the strength of the linear relationship, take a look at the activity below.<\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"sectionContain\">\r\n\r\n<iframe id=\"_i_1\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/html_activities\/linear_relationships_2_of_8\/linear_relationships_scatterplot.html\" width=\"750\" height=\"700\" frameborder=\"0\" data-gtm-yt-inspected-6=\"true\"><\/iframe>\r\n<p id=\"N10C01\">The slider bar at the top of the HTML activity allows us to vary the value of the correlation coefficient (r) between \u22121 and 1 in order to observe the effect on a scatterplot. Click the\u00a0<em class=\"bold\">Switch Sign<\/em>\u00a0button to change the sign of the correlation (positive or negative) while keeping the value the same.<\/p>\r\nNow that we understand the use of\u00a0<em>r<\/em>\u00a0as a numerical measure for assessing the direction and strength of linear relationships between quantitative variables, we will look at a few examples.\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<h4 class=\"exHead\">Highway Sign Visibility<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B13\">Earlier, we used the scatterplot below to find a\u00a0<em>negative linear<\/em>\u00a0relationship between the age of a driver and the maximum distance at which a highway sign was legible. What about the strength of the relationship? It turns out that the correlation between the two variables is r = -0.793.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"The scatterplot for highway sign visibility. The vertical axis is labeled &quot;Sign Legibility Distance (feet)&quot; and the horizontal axis is labeled &quot;Driver Age (years)&quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear3.gif\" alt=\"The scatterplot for highway sign visibility. The vertical axis is labeled &quot;Sign Legibility Distance (feet)&quot; and the horizontal axis is labeled &quot;Driver Age (years)&quot;\" \/><\/span><\/span>\r\n<p id=\"N10B1F\">Since r &lt; 0, it confirms that the direction of the relationship is negative (although we really didn\u2019t need r to tell us that). Since\u00a0<em>r<\/em>\u00a0is relatively close to -1, it suggests that the relationship is moderately strong. In context, the negative correlation confirms that the maximum distance at which a sign is legible generally decreases with age. Since the value of r indicates that the linear relationship is moderately strong, but not perfect, we can expect the maximum distance to vary somewhat, even among drivers of the same age.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\"><span class=\"scnReader\">Example<\/span><\/div>\r\n<div class=\"example clearfix\">\r\n<h4>Statistics Courses<\/h4>\r\n<div>\r\n<p id=\"N10B2B\">A statistics department is interested in tracking the progress of its students from entry until graduation. As part of the study, the department tabulates the performance of 10 students in an introductory course and in an upper-level course required for graduation. What is the relationship between the students\u2019 course averages in the two courses? Here is the scatterplot for the data:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A scatterplot for the data, in which the vertical axis is labeled &quot;Upper Level Course Average&quot; and the horizontal axis is labeled &quot;Introductory Course Average&quot;.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear4.gif\" alt=\"A scatterplot for the data, in which the vertical axis is labeled &quot;Upper Level Course Average&quot; and the horizontal axis is labeled &quot;Introductory Course Average&quot;.\" \/><\/span><\/span>\r\n<p id=\"N10B34\">The scatterplot suggests a relationship that is\u00a0<em>positive<\/em>\u00a0in direction,\u00a0<em>linear<\/em>\u00a0in form, and seems quite strong. The value of the correlation that we find between the two variables is\u00a0<em>r<\/em>\u00a0= 0.931, which is very close to 1, and thus confirms that indeed the linear relationship is very strong.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\nComment\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10B48\">Note that in both examples we supplemented the scatterplot with the correlation (r). Now that we have the correlation (r), why do we still need to look at a scatterplot when examining the relationship between two quantitative variables?<\/p>\r\n<p id=\"N10B4B\">The\u00a0<em>correlation<\/em>\u00a0coefficient can\u00a0<em>only<\/em>\u00a0be interpreted as the\u00a0<em>measure of the strength of a linear relationship<\/em>, so we need the scatterplot to verify that the relationship indeed looks linear. This point and its importance will be clearer after we examine a few properties of r.<\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"scatterplot graph\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image1.gif\" alt=\"scatterplot graph\" \/>\r\n<p style=\"text-align: center;\">[h5p id=\"44\"]<\/p>\r\n&nbsp;\r\n\r\n<img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image2.gif\" alt=\"\" \/>\r\n<p style=\"text-align: center;\">[h5p id=\"45\"]<\/p>\r\n&nbsp;\r\n\r\n<img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image3.gif\" alt=\"\" \/>\r\n<p style=\"text-align: center;\">[h5p id=\"46\"]<\/p>\r\n&nbsp;\r\n\r\n<img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image4.gif\" alt=\"\" \/>\r\n<p style=\"text-align: center;\">[h5p id=\"47\"]<\/p>\r\n&nbsp;\r\n\r\n<img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image5.gif\" alt=\"\" \/>\r\n<p style=\"text-align: center;\">[h5p id=\"48\"]<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Properties of r<\/span><\/h2>\r\n<p id=\"e96c9a3bd4a64bcfba3e0ff8f5cbb287\">We now discuss and illustrate several important properties of the correlation coefficient as a numerical measure of the strength of a linear relationship.<\/p>\r\n\r\n<ol id=\"e3dd58df5cd14b99a6f0e4c442aac425\">\r\n \t<li>\r\n<p id=\"ae36bbef823c4cd2bac8e30e962e48e1\">The correlation does not change when the units of measurement of either one of the variables change. In other words, if we change the units of measurement of the explanatory variable and\/or the response variable, the change has\u00a0<em class=\"italic\">no effect on the correlation (r)<\/em>.<\/p>\r\n<p id=\"f5c506e9b3ee4265a7c947828b9b72bf\">To illustrate, following are two versions of the scatterplot of the relationship between sign legibility distance and driver\u2019s age:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"afcc9e2dabf648aca55bc2e5c7650136\" class=\"img-responsive popimg aligncenter\" title=\"Two scatterplots showing the data for Driver Age vs. Sign Legibility. The first scatterplot&amp;apos;s vertical axis is labeled &amp;quot;Sign Legibility Distance (feet)&amp;quot; and the axis ranges from a little less than 300 to 600 feet. The horizontal axis is labeled &amp;quot;Driver Age (years)&amp;quot; and it ranges from 15 to 85. The second scatterplot has the same horizontal axis but the vertical axis is labeled &amp;quot;Sign Legibility Distance (meters)&amp;quot; and it ranges from 80 to 180.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear5.gif\" alt=\"Two scatterplots showing the data for Driver Age vs. Sign Legibility. The first scatterplot&amp;apos;s vertical axis is labeled &amp;quot;Sign Legibility Distance (feet)&amp;quot; and the axis ranges from a little less than 300 to 600 feet. The horizontal axis is labeled &amp;quot;Driver Age (years)&amp;quot; and it ranges from 15 to 85. The second scatterplot has the same horizontal axis but the vertical axis is labeled &amp;quot;Sign Legibility Distance (meters)&amp;quot; and it ranges from 80 to 180.\" \/><\/span><\/span>\r\n<p id=\"cd53e966f9ec4c2eae76efae773274b7\">The top scatterplot displays the original data where the maximum distances is measured in\u00a0<em class=\"italic\">feet<\/em>. The bottom scatterplot displays the same relationship but with maximum distances changed to\u00a0<em class=\"italic\">meters<\/em>. Notice that the Y-values have changed, but the correlations are the same. This example illustrates haw changing the units of measurement of the response variable has no effect on r, but as we indicated above, the same is true for changing the units of the explanatory variable, or of both variables.<\/p>\r\n<p id=\"e3010178737a4d10adec55c54cab0e76\">This might be a good place to comment that the correlation (r) is\u00a0<em class=\"italic\">unitless<\/em>. It is just a number.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"be35cfe930a548a5b372a084fb89db6f\">The correlation measures only the\u00a0<em class=\"italic\">strength<\/em>\u00a0of a linear relationship between two variables. It\u00a0<em class=\"italic\">ignores<\/em>\u00a0any other type of relationship, no matter how strong it is. For example, consider the relationship between the average fuel usage of driving a fixed distance in a car, and the speed at which the car drives:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b6b8956f127447b49d8b765bc1cc24d2\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis is labeled &amp;quot;Fuel Used (liter\/100km)&amp;quot; and the horizontal axis is labeled &amp;quot;Speed (km\/h).&amp;quot; The amount of fuel used rapidly decreases from speed=0 to about speed=60, where fuel used reaches its minimum value, then fuel used slowly increases linearly as speed increases.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear6.gif\" alt=\"A scatterplot in which the vertical axis is labeled &amp;quot;Fuel Used (liter\/100km)&amp;quot; and the horizontal axis is labeled &amp;quot;Speed (km\/h).&amp;quot; The amount of fuel used rapidly decreases from speed=0 to about speed=60, where fuel used reaches its minimum value, then fuel used slowly increases linearly as speed increases.\" \/><\/span><\/span>\r\n<p id=\"b70dd5fff0574d0eab703b84bfac1b10\">Our data describe a fairly simple curvilinear relationship: the amount of fuel consumed decreases rapidly to a minimum for a car driving 60 kilometers per hour, and then increases gradually for speeds exceeding 60 kilometers per hour. The relationship is very strong, as the observations seem to perfectly fit the curve.<\/p>\r\n<p id=\"f5b45c622e6a4e6081f60907d4f9be02\">Although the relationship is strong, the correlation r = -0.172 indicates a weak\u00a0<em class=\"italic\">linear<\/em>\u00a0relationship. This makes sense considering that the data fails to adhere closely to a linear form:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"dca170fdf7f9479eb5f7a001eee62212\" class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot, except a blue line with arrow has been draw over the plot, in the direction of a negative relationship. The data plotted does not align at all with this arrow.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear7.gif\" alt=\"The same scatterplot, except a blue line with arrow has been draw over the plot, in the direction of a negative relationship. The data plotted does not align at all with this arrow.\" \/><\/span><\/span>\r\n<p id=\"fc330a5485194ed286c32b5158529169\">The correlation is useless for assessing the strength of any type of relationship that is not linear (including relationships that are curvilinear, such as the one in our example). Beware, then, of interpreting the fact that r is close to 0 as an indicator of a weak relationship rather than a weak\u00a0<em class=\"italic\">linear<\/em>\u00a0relationship. This example also illustrates how important it is to\u00a0<em class=\"italic\">always look at the data in the scatterplot<\/em>\u00a0because, as in our example, there might be a strong nonlinear relationship that r does not indicate.<\/p>\r\n<p id=\"f9ca185e14494143889b8643aabf20e1\">Since the correlation was nearly zero when the form of the relationship was not linear, we might ask if the correlation can be used to determine whether or not a relationship is linear.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"bd31ea0fa982419d8de55562e50e178e\">The correlation by itself is\u00a0<em class=\"italic\">not<\/em>\u00a0sufficient to determine whether a relationship is linear. To see this, let\u2019s consider the study that examined the effect of monetary incentives on the return rate of questionnaires. Below is the scatterplot relating the percentage of participants who completed a survey to the monetary incentive that researchers promised to participants, in which we find a\u00a0<em class=\"italic\">trong curvilinear relationship<\/em>:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"bfe57e88456c49faba68897bf611a883\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis shows &amp;quot;Percentage Returned&amp;quot; and the horizontal axis shows &amp;quot;Incentive (dollars). &amp;quot; The data plotted shows a strong curvilinear relationship which roughly approximates a square root function.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear8.gif\" alt=\"A scatterplot in which the vertical axis shows &amp;quot;Percentage Returned&amp;quot; and the horizontal axis shows &amp;quot;Incentive (dollars). &amp;quot; The data plotted shows a strong curvilinear relationship which roughly approximates a square root function.\" \/><\/span><\/span>\r\n<p id=\"c4bc026658ae41cdbd321c8c2c8d261a\">The relationship is curvilinear, yet the correlation r = 0.876 is quite close to 1.<\/p>\r\n<p id=\"fdbccafb661e4ad186eae6fe5ed9215a\">In the last two examples, we have seen two very strong curvilinear relationships, one with a correlation close to 0 and one with a correlation close to 1. Therefore, the correlation alone does not indicate whether a relationship is linear. The important principle here is:<\/p>\r\n<p id=\"a74e2a665316441a89edbfc0215e93f0\"><em class=\"italic\">Always look at the data!<\/em><\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"c0eb3ac1e6d44b95b01f1de71076ab7d\">The correlation is heavily influenced by outliers. As you will learn in the next two activities, the way in which the outlier influences the correlation depends on whether the outlier is consistent with the pattern of the linear relationship.<\/p>\r\n<p id=\"c33062d40346484d99886ba893020043\">Using the applet below, we explore how an outlier affects the correlation.<\/p>\r\n<p id=\"c33062d40346484d99886ba893020043\"><iframe id=\"c1c8fc63202240a880ede1d58d0287db\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/html_activities\/linear_relationships_scatterplot\/linear_relationships_scatterplot.html\" width=\"750\" height=\"700\" frameborder=\"0\" data-gtm-yt-inspected-6=\"true\" data-mce-fragment=\"1\"><\/iframe><\/p>\r\n<p id=\"fe59fa688c7547dca7920b4bd83b8180\">To see how an outlier affects the correlation, do the following:<\/p>\r\n\r\n<ol id=\"c1ec20c94ed04f1bae60aa6cb2490bf2\">\r\n \t<li>\r\n<p id=\"bf8a070ca0684b6d80eda27505283c9b\">Fill the scatterplot with a hypothetical positive linear relationship between X and Y (by clicking on the graph about a dozen times starting at lower left and going up diagonally to the top right). Pay attention to the correlation coefficient calculated at the top right of the applet. (Clicking on the garbage can will let you start over.)<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"e6baf066d3ad4da7b52198023fbf97dd\">Once you are satisfied with your hypothetical data, create an outlier by clicking on one of the data points in the upper right of the graph, and dragging it down along the right side of the graph. Again, pay attention to what happens to the value of the correlation.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"b5b8e6d7a0fe4d70a8d02ef0008fe985\">Hopefully, you\u2019ve noticed the correlation decreasing when you created this kind of outlier, which\u00a0<em class=\"italic\">is not consistent<\/em> with the pattern of the relationship.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div class=\"\">\n<h2>Introduction<\/h2>\n<\/div>\n<div class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10B07\">So far we have visualized relationships between two quantitative variables using scatterplots, and described the overall pattern of a relationship by considering its direction, form, and strength. We noted that assessing the strength of a relationship just by looking at the scatterplot is quite difficult, and therefore we need to supplement the scatterplot with some kind of numerical measure that will help us assess the strength.<\/p>\n<p>In this part, we will restrict our attention to the\u00a0<em>special case of relationships that have a linear form<\/em>, since they are quite common and relatively simple to detect. More importantly, there exists a numerical measure that assesses the strength of the linear relationship between two quantitative variables with which we can supplement the scatterplot. We will introduce this numerical measure here and discuss it in detail.<\/p>\n<p id=\"N10B10\">Even though from this point on we are going to focus only on linear relationships, it is important to remember that\u00a0<em>not every relationship between two quantitative variables has a linear form.<\/em>\u00a0We have actually seen several examples of relationships that are not linear. The statistical tools that will be introduced here are\u00a0<em>appropriate only for examining linear relationships,<\/em>\u00a0and as we will see, when they are used in nonlinear situations, these tools can lead to errors in reasoning.<\/p>\n<p>Let\u2019s start with a motivating example. Consider the following two scatterplots.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Two scatterplots, displaying the same data. However, the first scatterplot has a much larger scale on its axes than the second. Because of this, the first scatterplot has its data points clustered closer together than in the second scatterplot. Both have their points arranged roughly in a linear manner. In addition, the second scatterplot appears to have some outliers. All this is merely due to the scale change.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear1.gif\" alt=\"Two scatterplots, displaying the same data. However, the first scatterplot has a much larger scale on its axes than the second. Because of this, the first scatterplot has its data points clustered closer together than in the second scatterplot. Both have their points arranged roughly in a linear manner. In addition, the second scatterplot appears to have some outliers. All this is merely due to the scale change.\" \/><\/span><\/span><\/p>\n<p id=\"N10B22\">We can see that in both cases, the direction of the relationship is\u00a0<em>positive<\/em>\u00a0and the form of the relationship is\u00a0<em>linear<\/em>. What about the strength? Recall that the strength of a relationship is the extent to which the data follow its form.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-43\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-43\" class=\"h5p-iframe\" data-content-id=\"43\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.4 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>The purpose of this example was to illustrate how assessing the strength of the linear relationship from a scatterplot alone is problematic, since our judgment might be affected by the scale on which the values are plotted. This example, therefore, provides a motivation for the\u00a0<em>need\u00a0<\/em>to supplement the scatterplot with a\u00a0<em>numerical measure<\/em>\u00a0that will\u00a0<em>measure the strength<\/em>\u00a0of the linear relationship between two quantitative variables.<\/p>\n<div class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Correlation Coefficient\u2014r<\/span><\/h2>\n<p id=\"N10B06\">The numerical measure that assesses the strength of a linear relationship is called the\u00a0<em>correlation coefficient<\/em>\u00a0and is denoted by\u00a0<em>r<\/em>. We will<\/p>\n<ul>\n<li>Define the correlation r.<\/li>\n<li>Discuss the calculation of r.<\/li>\n<li>Explain how to interpret the value of r.<\/li>\n<li>Talk about some of the properties of r.<\/li>\n<\/ul>\n<p><em>Definition:\u00a0<\/em>The correlation coefficient (r) is a numerical measure that measures the\u00a0<em>strength<\/em>\u00a0and\u00a0<em>direction<\/em>\u00a0of a linear relationship between two quantitative variables.<\/p>\n<p><em>Calculation:\u00a0<\/em>r is calculated using the following formula: [latex]\\mathcal{r}=\\frac{1}{\\mathcal{n}1}\\sum_{\\mathcal{i}=1}^{\\mathcal{n}}\\left(\\frac{\\mathcal{x}_\\mathcal{i}-\\bar{\\mathcal{x}}}{\\mathcal{S}_\\mathcal{x}}\\right)\\left(\\frac{\\mathcal{y}_\\mathcal{i}-\\bar{\\mathcal{y}}}{\\mathcal{S}_\\mathcal{x}}\\right)[\/latex]<\/p>\n<p id=\"N10BD5\">However, the calculation of the correlation (r) is not the focus of this course. We will use a statistics package to calculate r for us, and the emphasis of this course is on the\u00a0<em class=\"italic\">interpretation<\/em>\u00a0of its value.<\/p>\n<\/div>\n<\/div>\n<div id=\"N10BDE\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Interpretation<\/span><\/h2>\n<p id=\"N10BE5\">Once we obtain the value of r, its interpretation with respect to the strength of linear relationships is quite simple, as this walkthrough illustrates:<\/p>\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"Interpreting the value of r\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/Bt-Ey2ebfvs?feature=oembed&#38;rel=0&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<div class=\"figurewrap\">\n<div class=\"figure clearfix\">\n<div id=\"uwrap__i_0\" class=\"youtube\">\n<p><span style=\"text-align: initial; font-size: 1em;\">To get a better sense of how the value of r relates to the strength of the linear relationship, take a look at the activity below.<\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"sectionContain\">\n<p><iframe loading=\"lazy\" id=\"_i_1\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/html_activities\/linear_relationships_2_of_8\/linear_relationships_scatterplot.html\" width=\"750\" height=\"700\" frameborder=\"0\" data-gtm-yt-inspected-6=\"true\"><\/iframe><\/p>\n<p id=\"N10C01\">The slider bar at the top of the HTML activity allows us to vary the value of the correlation coefficient (r) between \u22121 and 1 in order to observe the effect on a scatterplot. Click the\u00a0<em class=\"bold\">Switch Sign<\/em>\u00a0button to change the sign of the correlation (positive or negative) while keeping the value the same.<\/p>\n<p>Now that we understand the use of\u00a0<em>r<\/em>\u00a0as a numerical measure for assessing the direction and strength of linear relationships between quantitative variables, we will look at a few examples.<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<h4 class=\"exHead\">Highway Sign Visibility<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B13\">Earlier, we used the scatterplot below to find a\u00a0<em>negative linear<\/em>\u00a0relationship between the age of a driver and the maximum distance at which a highway sign was legible. What about the strength of the relationship? It turns out that the correlation between the two variables is r = -0.793.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"The scatterplot for highway sign visibility. The vertical axis is labeled &quot;Sign Legibility Distance (feet)&quot; and the horizontal axis is labeled &quot;Driver Age (years)&quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear3.gif\" alt=\"The scatterplot for highway sign visibility. The vertical axis is labeled &quot;Sign Legibility Distance (feet)&quot; and the horizontal axis is labeled &quot;Driver Age (years)&quot;\" \/><\/span><\/span><\/p>\n<p id=\"N10B1F\">Since r &lt; 0, it confirms that the direction of the relationship is negative (although we really didn\u2019t need r to tell us that). Since\u00a0<em>r<\/em>\u00a0is relatively close to -1, it suggests that the relationship is moderately strong. In context, the negative correlation confirms that the maximum distance at which a sign is legible generally decreases with age. Since the value of r indicates that the linear relationship is moderately strong, but not perfect, we can expect the maximum distance to vary somewhat, even among drivers of the same age.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"exHead\"><span class=\"scnReader\">Example<\/span><\/div>\n<div class=\"example clearfix\">\n<h4>Statistics Courses<\/h4>\n<div>\n<p id=\"N10B2B\">A statistics department is interested in tracking the progress of its students from entry until graduation. As part of the study, the department tabulates the performance of 10 students in an introductory course and in an upper-level course required for graduation. What is the relationship between the students\u2019 course averages in the two courses? Here is the scatterplot for the data:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot for the data, in which the vertical axis is labeled &quot;Upper Level Course Average&quot; and the horizontal axis is labeled &quot;Introductory Course Average&quot;.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear4.gif\" alt=\"A scatterplot for the data, in which the vertical axis is labeled &quot;Upper Level Course Average&quot; and the horizontal axis is labeled &quot;Introductory Course Average&quot;.\" \/><\/span><\/span><\/p>\n<p id=\"N10B34\">The scatterplot suggests a relationship that is\u00a0<em>positive<\/em>\u00a0in direction,\u00a0<em>linear<\/em>\u00a0in form, and seems quite strong. The value of the correlation that we find between the two variables is\u00a0<em>r<\/em>\u00a0= 0.931, which is very close to 1, and thus confirms that indeed the linear relationship is very strong.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>Comment<\/p>\n<\/div>\n<\/div>\n<div class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<p id=\"N10B48\">Note that in both examples we supplemented the scatterplot with the correlation (r). Now that we have the correlation (r), why do we still need to look at a scatterplot when examining the relationship between two quantitative variables?<\/p>\n<p id=\"N10B4B\">The\u00a0<em>correlation<\/em>\u00a0coefficient can\u00a0<em>only<\/em>\u00a0be interpreted as the\u00a0<em>measure of the strength of a linear relationship<\/em>, so we need the scatterplot to verify that the relationship indeed looks linear. This point and its importance will be clearer after we examine a few properties of r.<\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p><img decoding=\"async\" id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"scatterplot graph\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image1.gif\" alt=\"scatterplot graph\" \/><\/p>\n<p style=\"text-align: center;\">\n<div id=\"h5p-44\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-44\" class=\"h5p-iframe\" data-content-id=\"44\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.3 Did I get this 1\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image2.gif\" alt=\"\" \/><\/p>\n<p style=\"text-align: center;\">\n<div id=\"h5p-45\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-45\" class=\"h5p-iframe\" data-content-id=\"45\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.3 Did I get this 2\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image3.gif\" alt=\"\" \/><\/p>\n<p style=\"text-align: center;\">\n<div id=\"h5p-46\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-46\" class=\"h5p-iframe\" data-content-id=\"46\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"Question 3\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image4.gif\" alt=\"\" \/><\/p>\n<p style=\"text-align: center;\">\n<div id=\"h5p-47\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-47\" class=\"h5p-iframe\" data-content-id=\"47\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.4 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_linear3q1_image5.gif\" alt=\"\" \/><\/p>\n<p style=\"text-align: center;\">\n<div id=\"h5p-48\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-48\" class=\"h5p-iframe\" data-content-id=\"48\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.3 Did I get this 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Properties of r<\/span><\/h2>\n<p id=\"e96c9a3bd4a64bcfba3e0ff8f5cbb287\">We now discuss and illustrate several important properties of the correlation coefficient as a numerical measure of the strength of a linear relationship.<\/p>\n<ol id=\"e3dd58df5cd14b99a6f0e4c442aac425\">\n<li>\n<p id=\"ae36bbef823c4cd2bac8e30e962e48e1\">The correlation does not change when the units of measurement of either one of the variables change. In other words, if we change the units of measurement of the explanatory variable and\/or the response variable, the change has\u00a0<em class=\"italic\">no effect on the correlation (r)<\/em>.<\/p>\n<p id=\"f5c506e9b3ee4265a7c947828b9b72bf\">To illustrate, following are two versions of the scatterplot of the relationship between sign legibility distance and driver\u2019s age:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"afcc9e2dabf648aca55bc2e5c7650136\" class=\"img-responsive popimg aligncenter\" title=\"Two scatterplots showing the data for Driver Age vs. Sign Legibility. The first scatterplot&amp;apos;s vertical axis is labeled &amp;quot;Sign Legibility Distance (feet)&amp;quot; and the axis ranges from a little less than 300 to 600 feet. The horizontal axis is labeled &amp;quot;Driver Age (years)&amp;quot; and it ranges from 15 to 85. The second scatterplot has the same horizontal axis but the vertical axis is labeled &amp;quot;Sign Legibility Distance (meters)&amp;quot; and it ranges from 80 to 180.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear5.gif\" alt=\"Two scatterplots showing the data for Driver Age vs. Sign Legibility. The first scatterplot&amp;apos;s vertical axis is labeled &amp;quot;Sign Legibility Distance (feet)&amp;quot; and the axis ranges from a little less than 300 to 600 feet. The horizontal axis is labeled &amp;quot;Driver Age (years)&amp;quot; and it ranges from 15 to 85. The second scatterplot has the same horizontal axis but the vertical axis is labeled &amp;quot;Sign Legibility Distance (meters)&amp;quot; and it ranges from 80 to 180.\" \/><\/span><\/span><\/p>\n<p id=\"cd53e966f9ec4c2eae76efae773274b7\">The top scatterplot displays the original data where the maximum distances is measured in\u00a0<em class=\"italic\">feet<\/em>. The bottom scatterplot displays the same relationship but with maximum distances changed to\u00a0<em class=\"italic\">meters<\/em>. Notice that the Y-values have changed, but the correlations are the same. This example illustrates haw changing the units of measurement of the response variable has no effect on r, but as we indicated above, the same is true for changing the units of the explanatory variable, or of both variables.<\/p>\n<p id=\"e3010178737a4d10adec55c54cab0e76\">This might be a good place to comment that the correlation (r) is\u00a0<em class=\"italic\">unitless<\/em>. It is just a number.<\/p>\n<\/li>\n<li>\n<p id=\"be35cfe930a548a5b372a084fb89db6f\">The correlation measures only the\u00a0<em class=\"italic\">strength<\/em>\u00a0of a linear relationship between two variables. It\u00a0<em class=\"italic\">ignores<\/em>\u00a0any other type of relationship, no matter how strong it is. For example, consider the relationship between the average fuel usage of driving a fixed distance in a car, and the speed at which the car drives:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b6b8956f127447b49d8b765bc1cc24d2\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis is labeled &amp;quot;Fuel Used (liter\/100km)&amp;quot; and the horizontal axis is labeled &amp;quot;Speed (km\/h).&amp;quot; The amount of fuel used rapidly decreases from speed=0 to about speed=60, where fuel used reaches its minimum value, then fuel used slowly increases linearly as speed increases.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear6.gif\" alt=\"A scatterplot in which the vertical axis is labeled &amp;quot;Fuel Used (liter\/100km)&amp;quot; and the horizontal axis is labeled &amp;quot;Speed (km\/h).&amp;quot; The amount of fuel used rapidly decreases from speed=0 to about speed=60, where fuel used reaches its minimum value, then fuel used slowly increases linearly as speed increases.\" \/><\/span><\/span><\/p>\n<p id=\"b70dd5fff0574d0eab703b84bfac1b10\">Our data describe a fairly simple curvilinear relationship: the amount of fuel consumed decreases rapidly to a minimum for a car driving 60 kilometers per hour, and then increases gradually for speeds exceeding 60 kilometers per hour. The relationship is very strong, as the observations seem to perfectly fit the curve.<\/p>\n<p id=\"f5b45c622e6a4e6081f60907d4f9be02\">Although the relationship is strong, the correlation r = -0.172 indicates a weak\u00a0<em class=\"italic\">linear<\/em>\u00a0relationship. This makes sense considering that the data fails to adhere closely to a linear form:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"dca170fdf7f9479eb5f7a001eee62212\" class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot, except a blue line with arrow has been draw over the plot, in the direction of a negative relationship. The data plotted does not align at all with this arrow.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear7.gif\" alt=\"The same scatterplot, except a blue line with arrow has been draw over the plot, in the direction of a negative relationship. The data plotted does not align at all with this arrow.\" \/><\/span><\/span><\/p>\n<p id=\"fc330a5485194ed286c32b5158529169\">The correlation is useless for assessing the strength of any type of relationship that is not linear (including relationships that are curvilinear, such as the one in our example). Beware, then, of interpreting the fact that r is close to 0 as an indicator of a weak relationship rather than a weak\u00a0<em class=\"italic\">linear<\/em>\u00a0relationship. This example also illustrates how important it is to\u00a0<em class=\"italic\">always look at the data in the scatterplot<\/em>\u00a0because, as in our example, there might be a strong nonlinear relationship that r does not indicate.<\/p>\n<p id=\"f9ca185e14494143889b8643aabf20e1\">Since the correlation was nearly zero when the form of the relationship was not linear, we might ask if the correlation can be used to determine whether or not a relationship is linear.<\/p>\n<\/li>\n<li>\n<p id=\"bd31ea0fa982419d8de55562e50e178e\">The correlation by itself is\u00a0<em class=\"italic\">not<\/em>\u00a0sufficient to determine whether a relationship is linear. To see this, let\u2019s consider the study that examined the effect of monetary incentives on the return rate of questionnaires. Below is the scatterplot relating the percentage of participants who completed a survey to the monetary incentive that researchers promised to participants, in which we find a\u00a0<em class=\"italic\">trong curvilinear relationship<\/em>:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"bfe57e88456c49faba68897bf611a883\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis shows &amp;quot;Percentage Returned&amp;quot; and the horizontal axis shows &amp;quot;Incentive (dollars). &amp;quot; The data plotted shows a strong curvilinear relationship which roughly approximates a square root function.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/linear8.gif\" alt=\"A scatterplot in which the vertical axis shows &amp;quot;Percentage Returned&amp;quot; and the horizontal axis shows &amp;quot;Incentive (dollars). &amp;quot; The data plotted shows a strong curvilinear relationship which roughly approximates a square root function.\" \/><\/span><\/span><\/p>\n<p id=\"c4bc026658ae41cdbd321c8c2c8d261a\">The relationship is curvilinear, yet the correlation r = 0.876 is quite close to 1.<\/p>\n<p id=\"fdbccafb661e4ad186eae6fe5ed9215a\">In the last two examples, we have seen two very strong curvilinear relationships, one with a correlation close to 0 and one with a correlation close to 1. Therefore, the correlation alone does not indicate whether a relationship is linear. The important principle here is:<\/p>\n<p id=\"a74e2a665316441a89edbfc0215e93f0\"><em class=\"italic\">Always look at the data!<\/em><\/p>\n<\/li>\n<li>\n<p id=\"c0eb3ac1e6d44b95b01f1de71076ab7d\">The correlation is heavily influenced by outliers. As you will learn in the next two activities, the way in which the outlier influences the correlation depends on whether the outlier is consistent with the pattern of the linear relationship.<\/p>\n<p id=\"c33062d40346484d99886ba893020043\">Using the applet below, we explore how an outlier affects the correlation.<\/p>\n<p><iframe loading=\"lazy\" id=\"c1c8fc63202240a880ede1d58d0287db\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/webcontent\/html_activities\/linear_relationships_scatterplot\/linear_relationships_scatterplot.html\" width=\"750\" height=\"700\" frameborder=\"0\" data-gtm-yt-inspected-6=\"true\" data-mce-fragment=\"1\"><\/iframe><\/p>\n<p id=\"fe59fa688c7547dca7920b4bd83b8180\">To see how an outlier affects the correlation, do the following:<\/p>\n<ol id=\"c1ec20c94ed04f1bae60aa6cb2490bf2\">\n<li>\n<p id=\"bf8a070ca0684b6d80eda27505283c9b\">Fill the scatterplot with a hypothetical positive linear relationship between X and Y (by clicking on the graph about a dozen times starting at lower left and going up diagonally to the top right). Pay attention to the correlation coefficient calculated at the top right of the applet. (Clicking on the garbage can will let you start over.)<\/p>\n<\/li>\n<li>\n<p id=\"e6baf066d3ad4da7b52198023fbf97dd\">Once you are satisfied with your hypothetical data, create an outlier by clicking on one of the data points in the upper right of the graph, and dragging it down along the right side of the graph. Again, pay attention to what happens to the value of the correlation.<\/p>\n<\/li>\n<\/ol>\n<\/li>\n<\/ol>\n<p id=\"b5b8e6d7a0fe4d70a8d02ef0008fe985\">Hopefully, you\u2019ve noticed the correlation decreasing when you created this kind of outlier, which\u00a0<em class=\"italic\">is not consistent<\/em> with the pattern of the relationship.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":4,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-487","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":417,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/487","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":9,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/487\/revisions"}],"predecessor-version":[{"id":852,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/487\/revisions\/852"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/417"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/487\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=487"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=487"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=487"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=487"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}