{"id":485,"date":"2024-10-18T01:59:51","date_gmt":"2024-10-18T01:59:51","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=485"},"modified":"2024-12-11T21:08:22","modified_gmt":"2024-12-11T21:08:22","slug":"3-2-scatterplots","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/3-2-scatterplots\/","title":{"raw":"3.2: Scatterplots","rendered":"3.2: Scatterplots"},"content":{"raw":"<h2 data-element=\"title\">Scatterplot<\/h2>\r\n<p id=\"N10AF8\">In the previous two cases we had a categorical explanatory variable, and therefore exploring the relationship between the two variables was done by comparing the distribution of the response variable for each category of the explanatory variable:<\/p>\r\n\r\n<ul>\r\n \t<li>In case C\u2192Q we compared distributions of the quantitative response.<\/li>\r\n \t<li>In case C\u2192C we compared distributions of the categorical response.<\/li>\r\n<\/ul>\r\n<p id=\"N10B04\">Case Q\u2192Q is different in the sense that both variables (in particular the explanatory variable) are quantitative, and therefore, as you\u2019ll discover, this case will require a different kind of treatment and tools. Let\u2019s start with an example:<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Highway Signs<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B0C\">A Pennsylvania research firm conducted a study in which 30 drivers (of ages 18 to 82 years old) were sampled, and for each one, the maximum distance (in feet) at which he\/she could read a newly designed sign was determined. The goal of this study was to explore the relationship between a driver\u2019s\u00a0<em>age<\/em>\u00a0and the\u00a0<em>maximum distance<\/em>\u00a0at which signs were legible, and then use the study\u2019s findings to improve safety for older drivers. (Reference: Utts and Heckard,\u00a0<em class=\"italic\">Mind on Statistics<\/em>\u00a0(2002). Originally source: Data collected by Last Resource, Inc, Bellfonte, PA.)<\/p>\r\n<p id=\"N10B19\">Since the purpose of this study is to explore the effect of age on maximum legibility distance,<\/p>\r\n\r\n<ul>\r\n \t<li>the\u00a0<em>explanatory<\/em>\u00a0variable is\u00a0<em>Age<\/em>, and<\/li>\r\n \t<li>the\u00a0<em>response<\/em>\u00a0variable is\u00a0<em>Distance<\/em>.<\/li>\r\n<\/ul>\r\n<p id=\"N10B31\">Here is what the raw data look like:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A table of the data. There are three columns, &quot;Driver&quot;, &quot;Age&quot;, and &quot;Distance&quot;. &quot;Age&quot; is the Explanatory variable, and &quot;Distance&quot; is the Response variable. Some example data: Driver 1, 18, 510; Driver 2, 32, 410; Driver 3, 55, 420; Driver 4, 23, 510; ... (abbreviated) ... Driver 30, 82, 360;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot2.gif\" alt=\"A table of the data. There are three columns, &quot;Driver&quot;, &quot;Age&quot;, and &quot;Distance&quot;. &quot;Age&quot; is the Explanatory variable, and &quot;Distance&quot; is the Response variable. Some example data: Driver 1, 18, 510; Driver 2, 32, 410; Driver 3, 55, 420; Driver 4, 23, 510; ... (abbreviated) ... Driver 30, 82, 360;\" \/><\/span><\/span>\r\n<p id=\"N10B3A\">Note that the data structure is such that for each individual (in this case driver 1\u2026.driver 30) we have a pair of values (in this case representing the driver\u2019s age and distance). We can therefore think about these data as 30 pairs of values: (18, 510), (32, 410), (55, 420), \u2026 , (82, 360).<\/p>\r\n<p id=\"N10B3D\">The first step in exploring the relationship between driver age and sign legibility distance is to create an appropriate and informative graphical display. The appropriate graphical display for examining the relationship between two quantitative variables is the\u00a0<em>scatterplot<\/em>. Here is how a scatterplot is constructed for our example:<\/p>\r\n<p id=\"N10B43\">To create a scatterplot, each pair of values is plotted, so that the value of the explanatory variable (X) is plotted on the horizontal axis, and the value of the response variable (Y) is plotted on the vertical axis. In other words, each individual (driver, in our example) appears on the scatterplot as a single point whose X-coordinate is the value of the explanatory variable for that individual, and whose Y-coordinate is the value of the response variable. Here is an illustration:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; Already plotted is Driver 1's data point. It is located at x=18,y=510. Also, Driver 2's data point has been ploted, at x=32,y=410.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot3.gif\" alt=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; Already plotted is Driver 1's data point. It is located at x=18,y=510. Also, Driver 2's data point has been ploted, at x=32,y=410.\" \/><\/span><\/span>\r\n\r\n&nbsp;\r\n\r\nAnd here is the completed scatterplot:\r\n\r\n&nbsp;\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"The completed scatterplot. There are 30 data points, shown as black dots scattered about.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot4.gif\" alt=\"The completed scatterplot. There are 30 data points, shown as black dots scattered about.\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2>Comment<\/h2>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B53\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10B5A\">It is important to mention again that when creating a scatterplot, the explanatory variable should always be plotted on the horizontal X-axis, and the response variable should be plotted on the vertical Y-axis. If in a specific example we do not have a clear distinction between explanatory and response variables, each of the variables can be plotted on either axis.<\/p>\r\n\r\n<h2><span title=\"Quick scroll up\">Interpreting the scatterplot<\/span><\/h2>\r\n<p id=\"e38a49531d034a048c6f154e7a603e00\">How do we explore the relationship between two quantitative variables using the scatterplot? What should we look at, or pay attention to?<\/p>\r\n<p id=\"eb40d5eae01e407595a37fb125836f49\">Recall that when we described the distribution of a single quantitative variable with a histogram, we described the overall pattern of the distribution (shape, center, spread) and any deviations from that pattern (outliers).\u00a0<em>We do the same thing with the scatterplot.<\/em>\u00a0The following figure summarizes this point:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d3face6cbb834388b49ac1efc9610242\" class=\"img-responsive popimg aligncenter\" title=\"When describing the relationship between two quantitative variables using a scatterplot, we look at: (1) The overall pattern, which can be described using direction, form, and strength. We also look at (2) Deviations from the pattern, which result from outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot5.gif\" alt=\"When describing the relationship between two quantitative variables using a scatterplot, we look at: (1) The overall pattern, which can be described using direction, form, and strength. We also look at (2) Deviations from the pattern, which result from outliers.\" \/><\/span><\/span>\r\n<p id=\"a623263338be4a7c99041f3864f363fa\">As the figure explains, when describing the\u00a0<em>overall pattern<\/em>\u00a0of the relationship we look at its direction, form and strength.<\/p>\r\n\r\n<ul id=\"fb94ef944962411c97d384fd462c0467\">\r\n \t<li>\r\n<p id=\"cde5b71db4fe4f7e81345591b75df30f\">The\u00a0<em>direction<\/em>\u00a0of the relationship can be positive, negative, or neither:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b7c4914db39245cba7fcfdb462264d51\" class=\"img-responsive popimg aligncenter\" title=\"A positive relationship. In this case, we traverse left across the x-axis, the y values tend to increase. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the bottom left to the top right of the scatter plot. Note that the points are not necessarily actually in a line, but instead their general shape appears to create a trend.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot6.gif\" alt=\"A positive relationship. In this case, we traverse left across the x-axis, the y values tend to increase. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the bottom left to the top right of the scatter plot. Note that the points are not necessarily actually in a line, but instead their general shape appears to create a trend.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"e34e566e5c124e26adcedf96ed2d706b\" class=\"img-responsive popimg aligncenter\" title=\"A negative relationship. In this case, we traverse left across the x-axis, the y values tend to decrease. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the top left to the bottom right of the scatter plot. Just as in the positive relationship, the points do not have to be on this imaginary line, they merely appear to create a trend that goes from the upper left to lower right.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot7.gif\" alt=\"A negative relationship. In this case, we traverse left across the x-axis, the y values tend to decrease. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the top left to the bottom right of the scatter plot. Just as in the positive relationship, the points do not have to be on this imaginary line, they merely appear to create a trend that goes from the upper left to lower right.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"c39f3a8d67044b53bb2d21feca7f6abf\" class=\"img-responsive popimg aligncenter\" title=\"Neither positive nor negative. The points here do not create a trend like in the positive or negative cases. In this example, they seem to create a V shape.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot8.gif\" alt=\"Neither positive nor negative. The points here do not create a trend like in the positive or negative cases. In this example, they seem to create a V shape.\" \/><\/span><\/span>\r\n<p id=\"baa10e6e3a6a4c34bb3b4b7f73f1e59d\">A\u00a0<em>positive (or increasing) relationship<\/em>\u00a0means that an increase in one of the variables is associated with an increase in the other.<\/p>\r\n<p id=\"b12829196d5a4e578bc819abfb738b68\">A\u00a0<em>negative (or decreasing) relationship<\/em>\u00a0means that an increase in one of the variables is associated with a decrease in the other.<\/p>\r\n<p id=\"b482bf94be934904bfb718c4ec91098c\">Not all relationships can be classified as either positive or negative.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d723c0777ab246ebaf3839fe78fbcbbe\">The\u00a0<em>form<\/em>\u00a0of the relationship is its general shape. When identifying the form, we try to find the simplest way to describe the shape of the scatterplot. There are many possible forms. Here are a couple that are quite common:<\/p>\r\n<p id=\"c8e3efac6d114a038265902118d3f659\">Relationships with a\u00a0<em>linear<\/em>\u00a0form are most simply described as points scattered about a line:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b2c9e14b7a324828b3dd175745bc87b3\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the points are slightly above or below a line which has been drawn through the points. Overall, the points create a shape that appears to be a fat line. In this example, the points create a negative relationship.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot9.gif\" alt=\"A scatterplot in which the points are slightly above or below a line which has been drawn through the points. Overall, the points create a shape that appears to be a fat line. In this example, the points create a negative relationship.\" \/><\/span><\/span>\r\n<p id=\"ea147ac517064da28e1af4afb1e6f095\">Relationships with a\u00a0<em>curvilinear<\/em>\u00a0form are most simply described as points dispersed around the same curved line:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fff08e7678044cd0bada5f14c16ca265\" class=\"img-responsive popimg aligncenter\" title=\"Here, the points in the scatterplot are slightly above or below a line which curves.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot10.gif\" alt=\"Here, the points in the scatterplot are slightly above or below a line which curves.\" \/><\/span><\/span>\r\n<p id=\"a8212e214d7b46e382c28fd402e9640f\">There are many other possible forms for the relationship between two quantitative variables, but linear and curvilinear forms are quite common and easy to identify. Another form-related pattern that we should be aware of is clusters in the data:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e56672e33e144a5996e87e06bfff2031\" class=\"img-responsive popimg aligncenter\" title=\"The points in this scatterplot create two groups. The points in a group are close together, and in between the two groups is an empty space in which there are no points. These groups are clusters.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot11.gif\" alt=\"The points in this scatterplot create two groups. The points in a group are close together, and in between the two groups is an empty space in which there are no points. These groups are clusters.\" \/><\/span><\/span><\/li>\r\n \t<li>\r\n<p id=\"f6cb186997e74cd5a8762cf3bdc50cf9\">The\u00a0<em>strength<\/em>\u00a0of the relationship is determined by how closely the data follow the form of the relationship. Let\u2019s look, for example, at the following two scatterplots displaying positive, linear relationships:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"cb5dee43141c4af8b9cb7fb36651c204\" class=\"img-responsive popimg aligncenter\" title=\"A line has been drawn through the points in this scatter plot. The points are very close to the line. This is a strong relationship.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot12.gif\" alt=\"A line has been drawn through the points in this scatter plot. The points are very close to the line. This is a strong relationship.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"afc50c2252024015b04235342bf67e73\" class=\"img-responsive popimg aligncenter\" title=\"Like the previous example, a line has been drawn through the points in the scatterplot. However, there are points both close to the line and quite far. Taken as a whole, the points appear to make a trapezoid instead of a line. This is a weaker relationship than the previous example.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot13.gif\" alt=\"Like the previous example, a line has been drawn through the points in the scatterplot. However, there are points both close to the line and quite far. Taken as a whole, the points appear to make a trapezoid instead of a line. This is a weaker relationship than the previous example.\" \/><\/span><\/span>\r\n<p id=\"a519c174fdd34adbb2c0540fd37ecd48\">The strength of the relationship is determined by how closely the data points follow the form. We can see that in the top scatterplot the data points follow the linear pattern quite closely. This is an example of a strong relationship. In the bottom scatterplot, the points also follow the linear pattern, but much less closely, and therefore we can say that the relationship is weaker. In general, though, assessing the strength of a relationship just by looking at the scatterplot is quite problematic, and we need a numerical measure to help us with that. We will discuss that later in this section.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"a5a251a525744783bc337238ca86be13\">Data points that\u00a0<em>deviate from the pattern<\/em>\u00a0of the relationship are called\u00a0<em>outliers<\/em>. We will see several examples of outliers during this section. Two outliers are illustrated in the scatterplot below:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d945790b4175450d93809b674091b450\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot which has a positive relationship. Most of the points are in a line-like shape from the bottom left of the plot to the top right. However, there are two points which do not match this trend. One is far below the majority of the points and the other is far left. These points do not participate in the line-like shape at all.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot14.gif\" alt=\"A scatterplot which has a positive relationship. Most of the points are in a line-like shape from the bottom left of the plot to the top right. However, there are two points which do not match this trend. One is far below the majority of the points and the other is far left. These points do not participate in the line-like shape at all.\" \/><\/span><\/span>\r\n\r\nLet\u2019s go back now to our example, and use the scatterplot to examine the relationship between the age of the driver and the maximum sign legibility distance. Here is the scatterplot:\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; There are 30 points on the plot.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot15.gif\" alt=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; There are 30 points on the plot.\" \/><\/span><\/span>\r\n<p id=\"N10AFF\">The direction of the relationship is\u00a0<em>negative<\/em>, which makes sense in context, since as you get older your eyesight weakens, and in particular older drivers tend to be able to read signs only at lesser distances. An arrow drawn over the scatterplot illustrates the negative direction of this relationship:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A line from the upper left of the plot to the lower right has been drawn. The points of the scatterplot roughly follow this line.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot16.gif\" alt=\"A line from the upper left of the plot to the lower right has been drawn. The points of the scatterplot roughly follow this line.\" \/><\/span><\/span>\r\n<p id=\"N10B0A\">The form of the relationship seems to be\u00a0<em>linear<\/em>. Notice how the points tend to be scattered about the line. Although, as we mentioned earlier, it is problematic to assess the strength without a numerical measure, the relationship appears to be\u00a0<em>moderately strong<\/em>, as the data is fairly tightly scattered about the line. Finally, all the data points seem to \u201cobey\u201d the pattern\u2014there\u00a0<em>do not appear to be any outliers<\/em>.<\/p>\r\n<p id=\"N10B00\">We will now look at two more examples:<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Average Gestation Period<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B08\">The average gestation period, or time of pregnancy, of an animal is closely related to its longevity (the length of its lifespan.) Data on the average gestation period and longevity (in captivity) of 40 different species of animals have been examined, with the purpose of examining how the gestation period of an animal is related to (or can be predicted from) its longevity. (Source: Rossman and Chance. (2001). Workshop statistics: Discovery with data and Minitab. Original source: The 1993 world almanac and book of facts).<\/p>\r\n<p id=\"N10B0B\">Here is the scatterplot of the data.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis is labeled &quot;Gestation (days)&quot; and it ranges from 0 to 700 days. The horizontal axis is labeled &quot;Longevity (years)&quot; and it ranges from 0 to 40 years.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot17.gif\" alt=\"A scatterplot in which the vertical axis is labeled &quot;Gestation (days)&quot; and it ranges from 0 to 700 days. The horizontal axis is labeled &quot;Longevity (years)&quot; and it ranges from 0 to 40 years.\" \/><\/span><\/span>\r\n<p id=\"N10B14\">What can we learn about the relationship from the scatterplot? The direction of the relationship is\u00a0<em>positive<\/em>, which means that animals with longer life spans tend to have longer times of pregnancy (this makes intuitive sense). An arrow drawn over the scatterplot below illustrates this:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot with a line and arrow drawn from the lower left to the upper right corners of the plot. Every point of data is confined to x\u226426 and y\u2264500, but there is one point at roughly x=40 and y=650 which is an outlier. There are also two red vertical lines at x=5 and x=12 which will be explained.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot18.gif\" alt=\"The same scatterplot with a line and arrow drawn from the lower left to the upper right corners of the plot. Every point of data is confined to x\u226426 and y\u2264500, but there is one point at roughly x=40 and y=650 which is an outlier. There are also two red vertical lines at x=5 and x=12 which will be explained.\" \/><\/span><\/span>\r\n<p id=\"N10B20\">The form of the relationship is again essentially\u00a0<em>linear<\/em>. There appears to be\u00a0<em>one outlier<\/em>, indicating an animal with an exceptionally long longevity and gestation period. (This animal happens to be the elephant.) Note that while this outlier definitely deviates from the rest of the data in term of its magnitude, it\u00a0<em>does<\/em>\u00a0follow the direction of the data.<\/p>\r\n<p id=\"N10B2C\"><em>Comment:<\/em>\u00a0Another feature of the scatterplot that is worth observing is how the variation in gestation increases as longevity increases. This fact is illustrated by the two red vertical lines at the bottom left part of the graph. Note that the gestation periods for animals who live 5 years range from about 30 days up to about 120 days. On the other hand, the gestation period of animals who live 12 years varies much more, and ranges from about 60 days up to more than 400 days.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n&nbsp;\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<h4 class=\"exHead\">Fuel Usage<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B38\">As a third example, consider the relationship between the average amount of fuel used (in liters) to drive a fixed distance in a car (100 kilometers), and the speed at which the car is driven (in kilometers per hour). (Source: Moore and McCabe, (2003). Introduction to the practice of statistics. Original source: T.N. Lam. (1985). \u201cEstimating fuel consumption for engine size,\u201d Journal of Transportation Engineering, vol. 111)<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A scatterplot of fuel usage in relation to speed. The vertical axis is labeled &quot;Fuel Used (liters\/100km)&quot; and the Horizontal axis is labeled &quot;Speed (km\/h)&quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot19.gif\" alt=\"A scatterplot of fuel usage in relation to speed. The vertical axis is labeled &quot;Fuel Used (liters\/100km)&quot; and the Horizontal axis is labeled &quot;Speed (km\/h)&quot;\" \/><\/span><\/span>\r\n<p id=\"N10B41\">The data describe a relationship that decreases and then increases\u2014the amount of fuel consumed decreases rapidly to a minimum for a car driving 60 kilometers per hour, and then increases gradually for speeds exceeding 60 kilometers per hour. This suggests that the speed at which a car economizes on fuel the most is about 60 km\/h. This forms a curvilinear relationship that seems to be very strong, as the observations seem to perfectly fit the curve. Finally, there do not appear to be any outliers.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10075\">A study examined how the percentage of participants who completed a survey is related to the monetary incentive that researchers promised to participants. Consider the relationship between these two quantitative variables, displayed in the scatterplot below.<\/p>\r\n&nbsp;\r\n<div class=\"image shouldbeleft\"><img id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"scatterplot graph\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_scatterplot4q1image1.gif\" alt=\"scatterplot graph\" \/><\/div>\r\n<div><\/div>\r\n<div>[h5p id=\"42\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B68\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\r\n<p id=\"N10B6F\">The example in the last activity provides a great opportunity for interpretation of the form of the relationship in context. Recall that the example examined how the percentage of participants who completed a survey is affected by the monetary incentive that researchers promised to participants. Here again is the scatterplot that displays the relationship:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The vertical axis is labeled &quot;Percentage Returned&quot; and the Horizontal Axis is labeled &quot;Incentive (dollars)&quot; The shown data closely follows a curved line which grows more quickly at lower values of dollars.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot25.gif\" alt=\"A scatterplot. The vertical axis is labeled &quot;Percentage Returned&quot; and the Horizontal Axis is labeled &quot;Incentive (dollars)&quot; The shown data closely follows a curved line which grows more quickly at lower values of dollars.\" \/><\/span><\/span>\r\n<p id=\"N10B78\">The positive relationship definitely makes sense in context, but what is the interpretation of the curvilinear form in the context of the problem? How can we explain (in context) the fact that the relationship seems at first to be increasing very rapidly, but then slows down? The following graph will help us:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot, except that some boxes have been drawn. The first box encompasses the area of the plot from x=0,y=0 to x=0,y=16. x=0,y=16 is the location of the first data point, showing that when the incentive is $0, the return rate is 16%. The next box encompasses the are from x=0,y=0 to x=10,y=43. This shows that when the incentive is $10, the return rate is 43%. The next box is the area between x=0,y=0 and x=30,y=54. The last box is from x=0,y=0 to x=40,y=57.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot26.gif\" alt=\"The same scatterplot, except that some boxes have been drawn. The first box encompasses the area of the plot from x=0,y=0 to x=0,y=16. x=0,y=16 is the location of the first data point, showing that when the incentive is $0, the return rate is 16%. The next box encompasses the are from x=0,y=0 to x=10,y=43. This shows that when the incentive is $10, the return rate is 43%. The next box is the area between x=0,y=0 and x=30,y=54. The last box is from x=0,y=0 to x=40,y=57.\" \/><\/span><\/span>\r\n<p id=\"N10B81\">Note that when the monetary incentive increases from $0 to $10, the percentage of returned surveys increases sharply\u2014an increase of 27% (from 16% to 43%). However, the same increase of $10 from $30 to $40 doesn\u2019t result in the same dramatic increase in the percentage of returned surveys\u2014it results in an increase of only 3% (from 54% to 57%). The form displays the phenomenon of \u201cdiminishing returns\u201d\u2014a return rate that after a certain point fails to increase proportionately to additional outlays of investment. $10 is worth more to people relative to $0 than it is relative to $30.<\/p>\r\n\r\n<h2><span title=\"Quick scroll up\">A Labeled Scatterplot<\/span><\/h2>\r\n<p id=\"N10B0E\">In certain circumstances, it may be reasonable to indicate different subgroups or categories within the data on the scatterplot, by labeling each subgroup differently. The result is called a\u00a0<em>labeled scatterplot<\/em>, and can provide further insight about the relationship we are exploring. here is an example.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"example clearfix\">\r\n<h4>Hot Dogs<\/h4>\r\n<div>\r\n\r\nRecall the hot dog example from case C\u2192Q, in which 54 major hot dog brands were examined. In this study, both the\u00a0<em>calorie content<\/em>\u00a0and the\u00a0<em>sodium level<\/em>\u00a0of each brand was recorded, as well as the\u00a0<em>type<\/em>\u00a0of hot dog: beef, poultry, and meat (mostly pork and beef, but up to 15% poultry meat). In this example we will explore the relationship between the sodium level and calorie content of hot dogs, and we will label the three different types of hot dogs to create a labeled scatterplot.\r\n<div>\r\n<h2 id=\"pagetitle\">Creating a Labeled Scatterplot<\/h2>\r\n<\/div>\r\n<div id=\"N10AC2\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10AC9\">The scatterplot below displays the relationship between the sodium and calorie content of 54 brands of hot dogs. Note that in this example there is no clear explanatory-response distinction, and we decided to have sodium content as the explanatory variable, and calorie content as the response variable.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot20.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10AD1\">The scatterplot displays a positive relationship, which means that hot dogs containing more sodium tend to be higher in calories.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot21.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10AD9\">The form of the relationship, however, is kind of hard to determine. Maybe if we label the scatterplot, indicating the type of hot dogs, we will get a better understanding of the form.<\/p>\r\n<p id=\"N10ADC\">Here is the labeled scatterplot, with the three different colors representing the three types of hot dogs, as indicated.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot22.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10AE4\">The display does give us more insight about the form of the relationship between sodium and calorie content.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot23.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10AEC\">It appears that there is a positive relationship within all three types. In other words, we can generally expect hot dogs that are higher in sodium to be higher in calories, no matter what type of hot dog we consider. In addition, we can see that hot dogs made of poultry (indicated in blue) are generally lower in calories. This is a result we have seen before.<\/p>\r\n<p id=\"N10AEF\">Interestingly, it appears that the form of the relationship specifically for poultry is further clustered, and<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot24.gif\" alt=\"\" \/><\/span><\/span>\r\n<p id=\"N10AF7\">we can only speculate about whether there is another categorical variable that describes these apparent sub-categories of poultry hot dogs.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<ul>\r\n \t<li>The relationship between two quantitative variables is visually displayed using the\u00a0<em>scatterplot<\/em>, where each point represents an individual. We always plot the explanatory variable on the horizontal X-axis and the response variable on the vertical Y-axis.<\/li>\r\n \t<li>When we explore a relationship using the scatterplot we should describe the\u00a0<em>overall pattern<\/em>\u00a0of the relationship and any\u00a0<em>deviations<\/em>\u00a0from that pattern. To describe the overall pattern consider the\u00a0<em>direction<\/em>,\u00a0<em>form<\/em>\u00a0and\u00a0<em>strength<\/em>\u00a0of the relationship. Assessing the strength just by looking at the scatterplot can be problematic; using a numerical measure to determine strength will be discussed later in this course.<\/li>\r\n \t<li>Adding labels to the scatterplot that indicate different groups or categories within the data might help us get more insight about the relationship we are exploring.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<h2 data-element=\"title\">Scatterplot<\/h2>\n<p id=\"N10AF8\">In the previous two cases we had a categorical explanatory variable, and therefore exploring the relationship between the two variables was done by comparing the distribution of the response variable for each category of the explanatory variable:<\/p>\n<ul>\n<li>In case C\u2192Q we compared distributions of the quantitative response.<\/li>\n<li>In case C\u2192C we compared distributions of the categorical response.<\/li>\n<\/ul>\n<p id=\"N10B04\">Case Q\u2192Q is different in the sense that both variables (in particular the explanatory variable) are quantitative, and therefore, as you\u2019ll discover, this case will require a different kind of treatment and tools. Let\u2019s start with an example:<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Highway Signs<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B0C\">A Pennsylvania research firm conducted a study in which 30 drivers (of ages 18 to 82 years old) were sampled, and for each one, the maximum distance (in feet) at which he\/she could read a newly designed sign was determined. The goal of this study was to explore the relationship between a driver\u2019s\u00a0<em>age<\/em>\u00a0and the\u00a0<em>maximum distance<\/em>\u00a0at which signs were legible, and then use the study\u2019s findings to improve safety for older drivers. (Reference: Utts and Heckard,\u00a0<em class=\"italic\">Mind on Statistics<\/em>\u00a0(2002). Originally source: Data collected by Last Resource, Inc, Bellfonte, PA.)<\/p>\n<p id=\"N10B19\">Since the purpose of this study is to explore the effect of age on maximum legibility distance,<\/p>\n<ul>\n<li>the\u00a0<em>explanatory<\/em>\u00a0variable is\u00a0<em>Age<\/em>, and<\/li>\n<li>the\u00a0<em>response<\/em>\u00a0variable is\u00a0<em>Distance<\/em>.<\/li>\n<\/ul>\n<p id=\"N10B31\">Here is what the raw data look like:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A table of the data. There are three columns, &quot;Driver&quot;, &quot;Age&quot;, and &quot;Distance&quot;. &quot;Age&quot; is the Explanatory variable, and &quot;Distance&quot; is the Response variable. Some example data: Driver 1, 18, 510; Driver 2, 32, 410; Driver 3, 55, 420; Driver 4, 23, 510; ... (abbreviated) ... Driver 30, 82, 360;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot2.gif\" alt=\"A table of the data. There are three columns, &quot;Driver&quot;, &quot;Age&quot;, and &quot;Distance&quot;. &quot;Age&quot; is the Explanatory variable, and &quot;Distance&quot; is the Response variable. Some example data: Driver 1, 18, 510; Driver 2, 32, 410; Driver 3, 55, 420; Driver 4, 23, 510; ... (abbreviated) ... Driver 30, 82, 360;\" \/><\/span><\/span><\/p>\n<p id=\"N10B3A\">Note that the data structure is such that for each individual (in this case driver 1\u2026.driver 30) we have a pair of values (in this case representing the driver\u2019s age and distance). We can therefore think about these data as 30 pairs of values: (18, 510), (32, 410), (55, 420), \u2026 , (82, 360).<\/p>\n<p id=\"N10B3D\">The first step in exploring the relationship between driver age and sign legibility distance is to create an appropriate and informative graphical display. The appropriate graphical display for examining the relationship between two quantitative variables is the\u00a0<em>scatterplot<\/em>. Here is how a scatterplot is constructed for our example:<\/p>\n<p id=\"N10B43\">To create a scatterplot, each pair of values is plotted, so that the value of the explanatory variable (X) is plotted on the horizontal axis, and the value of the response variable (Y) is plotted on the vertical axis. In other words, each individual (driver, in our example) appears on the scatterplot as a single point whose X-coordinate is the value of the explanatory variable for that individual, and whose Y-coordinate is the value of the response variable. Here is an illustration:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; Already plotted is Driver 1's data point. It is located at x=18,y=510. Also, Driver 2's data point has been ploted, at x=32,y=410.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot3.gif\" alt=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; Already plotted is Driver 1's data point. It is located at x=18,y=510. Also, Driver 2's data point has been ploted, at x=32,y=410.\" \/><\/span><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>And here is the completed scatterplot:<\/p>\n<p>&nbsp;<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"The completed scatterplot. There are 30 data points, shown as black dots scattered about.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot4.gif\" alt=\"The completed scatterplot. There are 30 data points, shown as black dots scattered about.\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2>Comment<\/h2>\n<\/div>\n<\/div>\n<div id=\"N10B53\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10B5A\">It is important to mention again that when creating a scatterplot, the explanatory variable should always be plotted on the horizontal X-axis, and the response variable should be plotted on the vertical Y-axis. If in a specific example we do not have a clear distinction between explanatory and response variables, each of the variables can be plotted on either axis.<\/p>\n<h2><span title=\"Quick scroll up\">Interpreting the scatterplot<\/span><\/h2>\n<p id=\"e38a49531d034a048c6f154e7a603e00\">How do we explore the relationship between two quantitative variables using the scatterplot? What should we look at, or pay attention to?<\/p>\n<p id=\"eb40d5eae01e407595a37fb125836f49\">Recall that when we described the distribution of a single quantitative variable with a histogram, we described the overall pattern of the distribution (shape, center, spread) and any deviations from that pattern (outliers).\u00a0<em>We do the same thing with the scatterplot.<\/em>\u00a0The following figure summarizes this point:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d3face6cbb834388b49ac1efc9610242\" class=\"img-responsive popimg aligncenter\" title=\"When describing the relationship between two quantitative variables using a scatterplot, we look at: (1) The overall pattern, which can be described using direction, form, and strength. We also look at (2) Deviations from the pattern, which result from outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot5.gif\" alt=\"When describing the relationship between two quantitative variables using a scatterplot, we look at: (1) The overall pattern, which can be described using direction, form, and strength. We also look at (2) Deviations from the pattern, which result from outliers.\" \/><\/span><\/span><\/p>\n<p id=\"a623263338be4a7c99041f3864f363fa\">As the figure explains, when describing the\u00a0<em>overall pattern<\/em>\u00a0of the relationship we look at its direction, form and strength.<\/p>\n<ul id=\"fb94ef944962411c97d384fd462c0467\">\n<li>\n<p id=\"cde5b71db4fe4f7e81345591b75df30f\">The\u00a0<em>direction<\/em>\u00a0of the relationship can be positive, negative, or neither:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b7c4914db39245cba7fcfdb462264d51\" class=\"img-responsive popimg aligncenter\" title=\"A positive relationship. In this case, we traverse left across the x-axis, the y values tend to increase. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the bottom left to the top right of the scatter plot. Note that the points are not necessarily actually in a line, but instead their general shape appears to create a trend.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot6.gif\" alt=\"A positive relationship. In this case, we traverse left across the x-axis, the y values tend to increase. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the bottom left to the top right of the scatter plot. Note that the points are not necessarily actually in a line, but instead their general shape appears to create a trend.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e34e566e5c124e26adcedf96ed2d706b\" class=\"img-responsive popimg aligncenter\" title=\"A negative relationship. In this case, we traverse left across the x-axis, the y values tend to decrease. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the top left to the bottom right of the scatter plot. Just as in the positive relationship, the points do not have to be on this imaginary line, they merely appear to create a trend that goes from the upper left to lower right.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot7.gif\" alt=\"A negative relationship. In this case, we traverse left across the x-axis, the y values tend to decrease. In other words, the points on the scatterplot appear to be in a shape which roughly resembles a line going from the top left to the bottom right of the scatter plot. Just as in the positive relationship, the points do not have to be on this imaginary line, they merely appear to create a trend that goes from the upper left to lower right.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c39f3a8d67044b53bb2d21feca7f6abf\" class=\"img-responsive popimg aligncenter\" title=\"Neither positive nor negative. The points here do not create a trend like in the positive or negative cases. In this example, they seem to create a V shape.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot8.gif\" alt=\"Neither positive nor negative. The points here do not create a trend like in the positive or negative cases. In this example, they seem to create a V shape.\" \/><\/span><\/span><\/p>\n<p id=\"baa10e6e3a6a4c34bb3b4b7f73f1e59d\">A\u00a0<em>positive (or increasing) relationship<\/em>\u00a0means that an increase in one of the variables is associated with an increase in the other.<\/p>\n<p id=\"b12829196d5a4e578bc819abfb738b68\">A\u00a0<em>negative (or decreasing) relationship<\/em>\u00a0means that an increase in one of the variables is associated with a decrease in the other.<\/p>\n<p id=\"b482bf94be934904bfb718c4ec91098c\">Not all relationships can be classified as either positive or negative.<\/p>\n<\/li>\n<li>\n<p id=\"d723c0777ab246ebaf3839fe78fbcbbe\">The\u00a0<em>form<\/em>\u00a0of the relationship is its general shape. When identifying the form, we try to find the simplest way to describe the shape of the scatterplot. There are many possible forms. Here are a couple that are quite common:<\/p>\n<p id=\"c8e3efac6d114a038265902118d3f659\">Relationships with a\u00a0<em>linear<\/em>\u00a0form are most simply described as points scattered about a line:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b2c9e14b7a324828b3dd175745bc87b3\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the points are slightly above or below a line which has been drawn through the points. Overall, the points create a shape that appears to be a fat line. In this example, the points create a negative relationship.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot9.gif\" alt=\"A scatterplot in which the points are slightly above or below a line which has been drawn through the points. Overall, the points create a shape that appears to be a fat line. In this example, the points create a negative relationship.\" \/><\/span><\/span><\/p>\n<p id=\"ea147ac517064da28e1af4afb1e6f095\">Relationships with a\u00a0<em>curvilinear<\/em>\u00a0form are most simply described as points dispersed around the same curved line:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fff08e7678044cd0bada5f14c16ca265\" class=\"img-responsive popimg aligncenter\" title=\"Here, the points in the scatterplot are slightly above or below a line which curves.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot10.gif\" alt=\"Here, the points in the scatterplot are slightly above or below a line which curves.\" \/><\/span><\/span><\/p>\n<p id=\"a8212e214d7b46e382c28fd402e9640f\">There are many other possible forms for the relationship between two quantitative variables, but linear and curvilinear forms are quite common and easy to identify. Another form-related pattern that we should be aware of is clusters in the data:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e56672e33e144a5996e87e06bfff2031\" class=\"img-responsive popimg aligncenter\" title=\"The points in this scatterplot create two groups. The points in a group are close together, and in between the two groups is an empty space in which there are no points. These groups are clusters.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot11.gif\" alt=\"The points in this scatterplot create two groups. The points in a group are close together, and in between the two groups is an empty space in which there are no points. These groups are clusters.\" \/><\/span><\/span><\/li>\n<li>\n<p id=\"f6cb186997e74cd5a8762cf3bdc50cf9\">The\u00a0<em>strength<\/em>\u00a0of the relationship is determined by how closely the data follow the form of the relationship. Let\u2019s look, for example, at the following two scatterplots displaying positive, linear relationships:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"cb5dee43141c4af8b9cb7fb36651c204\" class=\"img-responsive popimg aligncenter\" title=\"A line has been drawn through the points in this scatter plot. The points are very close to the line. This is a strong relationship.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot12.gif\" alt=\"A line has been drawn through the points in this scatter plot. The points are very close to the line. This is a strong relationship.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"afc50c2252024015b04235342bf67e73\" class=\"img-responsive popimg aligncenter\" title=\"Like the previous example, a line has been drawn through the points in the scatterplot. However, there are points both close to the line and quite far. Taken as a whole, the points appear to make a trapezoid instead of a line. This is a weaker relationship than the previous example.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot13.gif\" alt=\"Like the previous example, a line has been drawn through the points in the scatterplot. However, there are points both close to the line and quite far. Taken as a whole, the points appear to make a trapezoid instead of a line. This is a weaker relationship than the previous example.\" \/><\/span><\/span><\/p>\n<p id=\"a519c174fdd34adbb2c0540fd37ecd48\">The strength of the relationship is determined by how closely the data points follow the form. We can see that in the top scatterplot the data points follow the linear pattern quite closely. This is an example of a strong relationship. In the bottom scatterplot, the points also follow the linear pattern, but much less closely, and therefore we can say that the relationship is weaker. In general, though, assessing the strength of a relationship just by looking at the scatterplot is quite problematic, and we need a numerical measure to help us with that. We will discuss that later in this section.<\/p>\n<\/li>\n<\/ul>\n<p id=\"a5a251a525744783bc337238ca86be13\">Data points that\u00a0<em>deviate from the pattern<\/em>\u00a0of the relationship are called\u00a0<em>outliers<\/em>. We will see several examples of outliers during this section. Two outliers are illustrated in the scatterplot below:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d945790b4175450d93809b674091b450\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot which has a positive relationship. Most of the points are in a line-like shape from the bottom left of the plot to the top right. However, there are two points which do not match this trend. One is far below the majority of the points and the other is far left. These points do not participate in the line-like shape at all.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot14.gif\" alt=\"A scatterplot which has a positive relationship. Most of the points are in a line-like shape from the bottom left of the plot to the top right. However, there are two points which do not match this trend. One is far below the majority of the points and the other is far left. These points do not participate in the line-like shape at all.\" \/><\/span><\/span><\/p>\n<p>Let\u2019s go back now to our example, and use the scatterplot to examine the relationship between the age of the driver and the maximum sign legibility distance. Here is the scatterplot:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; There are 30 points on the plot.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot15.gif\" alt=\"A scatterplot. The Y-Axis is labeled &quot;Sign Legibility Distance (feet)&quot;, and the X-axis is labeled &quot;Driver Age (years)&quot; There are 30 points on the plot.\" \/><\/span><\/span><\/p>\n<p id=\"N10AFF\">The direction of the relationship is\u00a0<em>negative<\/em>, which makes sense in context, since as you get older your eyesight weakens, and in particular older drivers tend to be able to read signs only at lesser distances. An arrow drawn over the scatterplot illustrates the negative direction of this relationship:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A line from the upper left of the plot to the lower right has been drawn. The points of the scatterplot roughly follow this line.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot16.gif\" alt=\"A line from the upper left of the plot to the lower right has been drawn. The points of the scatterplot roughly follow this line.\" \/><\/span><\/span><\/p>\n<p id=\"N10B0A\">The form of the relationship seems to be\u00a0<em>linear<\/em>. Notice how the points tend to be scattered about the line. Although, as we mentioned earlier, it is problematic to assess the strength without a numerical measure, the relationship appears to be\u00a0<em>moderately strong<\/em>, as the data is fairly tightly scattered about the line. Finally, all the data points seem to \u201cobey\u201d the pattern\u2014there\u00a0<em>do not appear to be any outliers<\/em>.<\/p>\n<p id=\"N10B00\">We will now look at two more examples:<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Average Gestation Period<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B08\">The average gestation period, or time of pregnancy, of an animal is closely related to its longevity (the length of its lifespan.) Data on the average gestation period and longevity (in captivity) of 40 different species of animals have been examined, with the purpose of examining how the gestation period of an animal is related to (or can be predicted from) its longevity. (Source: Rossman and Chance. (2001). Workshop statistics: Discovery with data and Minitab. Original source: The 1993 world almanac and book of facts).<\/p>\n<p id=\"N10B0B\">Here is the scatterplot of the data.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot in which the vertical axis is labeled &quot;Gestation (days)&quot; and it ranges from 0 to 700 days. The horizontal axis is labeled &quot;Longevity (years)&quot; and it ranges from 0 to 40 years.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot17.gif\" alt=\"A scatterplot in which the vertical axis is labeled &quot;Gestation (days)&quot; and it ranges from 0 to 700 days. The horizontal axis is labeled &quot;Longevity (years)&quot; and it ranges from 0 to 40 years.\" \/><\/span><\/span><\/p>\n<p id=\"N10B14\">What can we learn about the relationship from the scatterplot? The direction of the relationship is\u00a0<em>positive<\/em>, which means that animals with longer life spans tend to have longer times of pregnancy (this makes intuitive sense). An arrow drawn over the scatterplot below illustrates this:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot with a line and arrow drawn from the lower left to the upper right corners of the plot. Every point of data is confined to x\u226426 and y\u2264500, but there is one point at roughly x=40 and y=650 which is an outlier. There are also two red vertical lines at x=5 and x=12 which will be explained.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot18.gif\" alt=\"The same scatterplot with a line and arrow drawn from the lower left to the upper right corners of the plot. Every point of data is confined to x\u226426 and y\u2264500, but there is one point at roughly x=40 and y=650 which is an outlier. There are also two red vertical lines at x=5 and x=12 which will be explained.\" \/><\/span><\/span><\/p>\n<p id=\"N10B20\">The form of the relationship is again essentially\u00a0<em>linear<\/em>. There appears to be\u00a0<em>one outlier<\/em>, indicating an animal with an exceptionally long longevity and gestation period. (This animal happens to be the elephant.) Note that while this outlier definitely deviates from the rest of the data in term of its magnitude, it\u00a0<em>does<\/em>\u00a0follow the direction of the data.<\/p>\n<p id=\"N10B2C\"><em>Comment:<\/em>\u00a0Another feature of the scatterplot that is worth observing is how the variation in gestation increases as longevity increases. This fact is illustrated by the two red vertical lines at the bottom left part of the graph. Note that the gestation periods for animals who live 5 years range from about 30 days up to about 120 days. On the other hand, the gestation period of animals who live 12 years varies much more, and ranges from about 60 days up to more than 400 days.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<\/div>\n<\/div>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<h4 class=\"exHead\">Fuel Usage<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B38\">As a third example, consider the relationship between the average amount of fuel used (in liters) to drive a fixed distance in a car (100 kilometers), and the speed at which the car is driven (in kilometers per hour). (Source: Moore and McCabe, (2003). Introduction to the practice of statistics. Original source: T.N. Lam. (1985). \u201cEstimating fuel consumption for engine size,\u201d Journal of Transportation Engineering, vol. 111)<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot of fuel usage in relation to speed. The vertical axis is labeled &quot;Fuel Used (liters\/100km)&quot; and the Horizontal axis is labeled &quot;Speed (km\/h)&quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot19.gif\" alt=\"A scatterplot of fuel usage in relation to speed. The vertical axis is labeled &quot;Fuel Used (liters\/100km)&quot; and the Horizontal axis is labeled &quot;Speed (km\/h)&quot;\" \/><\/span><\/span><\/p>\n<p id=\"N10B41\">The data describe a relationship that decreases and then increases\u2014the amount of fuel consumed decreases rapidly to a minimum for a car driving 60 kilometers per hour, and then increases gradually for speeds exceeding 60 kilometers per hour. This suggests that the speed at which a car economizes on fuel the most is about 60 km\/h. This forms a curvilinear relationship that seems to be very strong, as the observations seem to perfectly fit the curve. Finally, there do not appear to be any outliers.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10075\">A study examined how the percentage of participants who completed a survey is related to the monetary incentive that researchers promised to participants. Consider the relationship between these two quantitative variables, displayed in the scatterplot below.<\/p>\n<p>&nbsp;<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N10077\" class=\"img-responsive popimg aligncenter\" title=\"scatterplot graph\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/didigetthis_scatterplot4q1image1.gif\" alt=\"scatterplot graph\" \/><\/div>\n<div><\/div>\n<div>\n<div id=\"h5p-42\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-42\" class=\"h5p-iframe\" data-content-id=\"42\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"3.2 Learn by Doing 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10B68\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comment<\/span><\/h2>\n<p id=\"N10B6F\">The example in the last activity provides a great opportunity for interpretation of the form of the relationship in context. Recall that the example examined how the percentage of participants who completed a survey is affected by the monetary incentive that researchers promised to participants. Here again is the scatterplot that displays the relationship:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"A scatterplot. The vertical axis is labeled &quot;Percentage Returned&quot; and the Horizontal Axis is labeled &quot;Incentive (dollars)&quot; The shown data closely follows a curved line which grows more quickly at lower values of dollars.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot25.gif\" alt=\"A scatterplot. The vertical axis is labeled &quot;Percentage Returned&quot; and the Horizontal Axis is labeled &quot;Incentive (dollars)&quot; The shown data closely follows a curved line which grows more quickly at lower values of dollars.\" \/><\/span><\/span><\/p>\n<p id=\"N10B78\">The positive relationship definitely makes sense in context, but what is the interpretation of the curvilinear form in the context of the problem? How can we explain (in context) the fact that the relationship seems at first to be increasing very rapidly, but then slows down? The following graph will help us:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"The same scatterplot, except that some boxes have been drawn. The first box encompasses the area of the plot from x=0,y=0 to x=0,y=16. x=0,y=16 is the location of the first data point, showing that when the incentive is $0, the return rate is 16%. The next box encompasses the are from x=0,y=0 to x=10,y=43. This shows that when the incentive is $10, the return rate is 43%. The next box is the area between x=0,y=0 and x=30,y=54. The last box is from x=0,y=0 to x=40,y=57.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot26.gif\" alt=\"The same scatterplot, except that some boxes have been drawn. The first box encompasses the area of the plot from x=0,y=0 to x=0,y=16. x=0,y=16 is the location of the first data point, showing that when the incentive is $0, the return rate is 16%. The next box encompasses the are from x=0,y=0 to x=10,y=43. This shows that when the incentive is $10, the return rate is 43%. The next box is the area between x=0,y=0 and x=30,y=54. The last box is from x=0,y=0 to x=40,y=57.\" \/><\/span><\/span><\/p>\n<p id=\"N10B81\">Note that when the monetary incentive increases from $0 to $10, the percentage of returned surveys increases sharply\u2014an increase of 27% (from 16% to 43%). However, the same increase of $10 from $30 to $40 doesn\u2019t result in the same dramatic increase in the percentage of returned surveys\u2014it results in an increase of only 3% (from 54% to 57%). The form displays the phenomenon of \u201cdiminishing returns\u201d\u2014a return rate that after a certain point fails to increase proportionately to additional outlays of investment. $10 is worth more to people relative to $0 than it is relative to $30.<\/p>\n<h2><span title=\"Quick scroll up\">A Labeled Scatterplot<\/span><\/h2>\n<p id=\"N10B0E\">In certain circumstances, it may be reasonable to indicate different subgroups or categories within the data on the scatterplot, by labeling each subgroup differently. The result is called a\u00a0<em>labeled scatterplot<\/em>, and can provide further insight about the relationship we are exploring. here is an example.<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"example clearfix\">\n<h4>Hot Dogs<\/h4>\n<div>\n<p>Recall the hot dog example from case C\u2192Q, in which 54 major hot dog brands were examined. In this study, both the\u00a0<em>calorie content<\/em>\u00a0and the\u00a0<em>sodium level<\/em>\u00a0of each brand was recorded, as well as the\u00a0<em>type<\/em>\u00a0of hot dog: beef, poultry, and meat (mostly pork and beef, but up to 15% poultry meat). In this example we will explore the relationship between the sodium level and calorie content of hot dogs, and we will label the three different types of hot dogs to create a labeled scatterplot.<\/p>\n<div>\n<h2 id=\"pagetitle\">Creating a Labeled Scatterplot<\/h2>\n<\/div>\n<div id=\"N10AC2\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10AC9\">The scatterplot below displays the relationship between the sodium and calorie content of 54 brands of hot dogs. Note that in this example there is no clear explanatory-response distinction, and we decided to have sodium content as the explanatory variable, and calorie content as the response variable.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot20.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10AD1\">The scatterplot displays a positive relationship, which means that hot dogs containing more sodium tend to be higher in calories.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot21.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10AD9\">The form of the relationship, however, is kind of hard to determine. Maybe if we label the scatterplot, indicating the type of hot dogs, we will get a better understanding of the form.<\/p>\n<p id=\"N10ADC\">Here is the labeled scatterplot, with the three different colors representing the three types of hot dogs, as indicated.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot22.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10AE4\">The display does give us more insight about the form of the relationship between sodium and calorie content.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot23.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10AEC\">It appears that there is a positive relationship within all three types. In other words, we can generally expect hot dogs that are higher in sodium to be higher in calories, no matter what type of hot dog we consider. In addition, we can see that hot dogs made of poultry (indicated in blue) are generally lower in calories. This is a result we have seen before.<\/p>\n<p id=\"N10AEF\">Interestingly, it appears that the form of the relationship specifically for poultry is further clustered, and<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m2_examining_relationships\/webcontent\/scatterplot24.gif\" alt=\"\" \/><\/span><\/span><\/p>\n<p id=\"N10AF7\">we can only speculate about whether there is another categorical variable that describes these apparent sub-categories of poultry hot dogs.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<ul>\n<li>The relationship between two quantitative variables is visually displayed using the\u00a0<em>scatterplot<\/em>, where each point represents an individual. We always plot the explanatory variable on the horizontal X-axis and the response variable on the vertical Y-axis.<\/li>\n<li>When we explore a relationship using the scatterplot we should describe the\u00a0<em>overall pattern<\/em>\u00a0of the relationship and any\u00a0<em>deviations<\/em>\u00a0from that pattern. To describe the overall pattern consider the\u00a0<em>direction<\/em>,\u00a0<em>form<\/em>\u00a0and\u00a0<em>strength<\/em>\u00a0of the relationship. Assessing the strength just by looking at the scatterplot can be problematic; using a numerical measure to determine strength will be discussed later in this course.<\/li>\n<li>Adding labels to the scatterplot that indicate different groups or categories within the data might help us get more insight about the relationship we are exploring.<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":3,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-485","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":417,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/485","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":8,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/485\/revisions"}],"predecessor-version":[{"id":849,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/485\/revisions\/849"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/417"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/485\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=485"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=485"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=485"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=485"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}