{"id":459,"date":"2024-10-18T01:40:04","date_gmt":"2024-10-18T01:40:04","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=459"},"modified":"2025-01-06T19:27:53","modified_gmt":"2025-01-06T19:27:53","slug":"2-3-measures-of-spread","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/2-3-measures-of-spread\/","title":{"raw":"2.3: Measures of Spread","rendered":"2.3: Measures of Spread"},"content":{"raw":"<div id=\"lobjh\" class=\"\">\r\n<h2>Introduction<\/h2>\r\n<\/div>\r\n<div id=\"N10B06\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10B0D\">So far we have learned about different ways to quantify the center of a distribution. A measure of center by itself is not enough, though, to describe a distribution. Consider the following two distributions of exam scores. Both distributions are centered at 70 (the median of both distributions is approximately 70), but the distributions are quite different. The first distribution has a much larger variability in scores compared to the second one.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"Two dot plots of exam scores. The first plot has a median of approximately 70, but there are scores from below 50 to above 90. In the second dot plot, the median is once again about 70, but this time the range of scores is from about 60 to about 80.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread1.gif\" alt=\"Two dot plots of exam scores. The first plot has a median of approximately 70, but there are scores from below 50 to above 90. In the second dot plot, the median is once again about 70, but this time the range of scores is from about 60 to about 80.\" \/><\/span><\/span>\r\n<p id=\"N10B16\">In order to describe the distribution, we therefore need to supplement the graphical display not only with a measure of center, but also with a measure of the variability (or spread) of the distribution.<\/p>\r\n<p id=\"N10B19\">In this section, we will discuss the three most commonly used measures of spread:<\/p>\r\n\r\n<ul>\r\n \t<li>Range<\/li>\r\n \t<li>Inter-quartile range (IQR)<\/li>\r\n \t<li>Standard deviation<\/li>\r\n<\/ul>\r\n<p id=\"N10B28\">Like the different measures of center, these measures provide different ways to quantify the variability of the distribution.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B2D\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Range<\/span><\/h2>\r\n<p id=\"N10B34\">The\u00a0<em>range<\/em>\u00a0covered by the data is the most intuitive measure of variability. The range is exactly the distance between the smallest data point (min) and the largest one (Max).<\/p>\r\n\r\n<ul>\r\n \t<li>Range = Max \u2013 min<\/li>\r\n<\/ul>\r\n<p id=\"N10B40\">Note: When we first looked at the histogram, and tried to get a first feel for the spread of the data, we were actually\u00a0<em class=\"italic\">approximating<\/em>\u00a0the range, rather than calculating the exact range.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"example clearfix\">Best Actress Oscar Winners<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B4C\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a><\/p>\r\n\r\n<table class=\"formula\">\r\n<tbody>\r\n<tr>\r\n<td>34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"N10B5B\">In this example:<\/p>\r\n\r\n<ul>\r\n \t<li>min = 21 (Marlee Matlin for\u00a0<em class=\"italic\">Children of a Lesser God<\/em>, 1986)<\/li>\r\n \t<li>Max = 80 (Jessica Tandy for\u00a0<em class=\"italic\">Driving Miss Daisy<\/em>, 1989)<\/li>\r\n<\/ul>\r\n<p id=\"N10B6F\">The range covered by all the data is 80 \u2013 21 = 59 years.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10AFF\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Inter-Quartile Range (IQR)<\/span><\/h2>\r\nWhile the range quantifies the variability by looking at the range covered by\u00a0<em class=\"italic\">ALL<\/em>\u00a0the data, the IQR measures the variability of a distribution by giving us the range covered by the\u00a0<em class=\"italic\">MIDDLE 50%<\/em>\u00a0of the data.\r\n<p id=\"N10B11\">The following picture illustrates this idea: (Think about the horizontal line as the data ranging from the min to the Max).<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A horizontal line representing all of the data. The entire line represents the range of the data, and the leftmost point is the minimum data point. The rightmost point is the maximum data point. 25% of the range spanning the area between the leftmost point and 1\/4 of the line from the leftmost point is labeled the Bottom 25% of the data. The area from the 1\/4 point to the 3\/4 point is labeled the middle 50% of the data. This is where the IQR is calculated. Indeed, the middle 50% represents half of the line. The rest of the line, the remaining 1\/4 from the 3\/4 point to the rightmost point, is the top 25% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread2.gif\" alt=\"A horizontal line representing all of the data. The entire line represents the range of the data, and the leftmost point is the minimum data point. The rightmost point is the maximum data point. 25% of the range spanning the area between the leftmost point and 1\/4 of the line from the leftmost point is labeled the Bottom 25% of the data. The area from the 1\/4 point to the 3\/4 point is labeled the middle 50% of the data. This is where the IQR is calculated. Indeed, the middle 50% represents half of the line. The rest of the line, the remaining 1\/4 from the 3\/4 point to the rightmost point, is the top 25% of the data.\" \/><\/span><\/span>\r\n<p id=\"N10B1A\">Here is how the IQR is actually found:<\/p>\r\n\r\n<ol>\r\n \t<li>Arrange the data in increasing order, and find the median M. Recall that the median divides the data, so that 50% of the data points are below the median, and 50% of the data points are above the median.<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A line representing the range of the data. Once again, the leftmost point is the minimum, and the rightmost point is the maximum. At the middle is M, the median. All of the line to the left of M is the bottom 50% of the data, and all of the line to the right of M is the top 50% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread3.gif\" alt=\"A line representing the range of the data. Once again, the leftmost point is the minimum, and the rightmost point is the maximum. At the middle is M, the median. All of the line to the left of M is the bottom 50% of the data, and all of the line to the right of M is the top 50% of the data.\" \/><\/span><\/span><\/li>\r\n \t<li>Find the median of the lower 50% of the data. This is called the first quartile of the distribution, and the point is denoted by Q1. Note from the picture that Q1 divides the lower 50% of the data into two halves, containing 25% of the data points in each half. Q1 is called the first quartile, since one quarter of the data points fall below it.<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"The same line as the image above, except the bottom 50% has been split in half at the median of all of the data in the bottom 50%. This median is Q1. To the left of Q1 is 25% of the data. This is between the minimum point and Q1. On the other side of Q1 is another 25% of the data. This is from Q1 to M. Together these two 25% sections make up the bottom 50% of the data. To the right of M is the top 50% of the data, so in total, to the right of Q1 is 25% of the data and the top 50% of the data, for a total of 75% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread4.gif\" alt=\"The same line as the image above, except the bottom 50% has been split in half at the median of all of the data in the bottom 50%. This median is Q1. To the left of Q1 is 25% of the data. This is between the minimum point and Q1. On the other side of Q1 is another 25% of the data. This is from Q1 to M. Together these two 25% sections make up the bottom 50% of the data. To the right of M is the top 50% of the data, so in total, to the right of Q1 is 25% of the data and the top 50% of the data, for a total of 75% of the data.\" \/><\/span><\/span><\/li>\r\n \t<li>Repeat this again for the top 50% of the data. Find the median of the top 50% of the data. This point is called the third quartile of the distribution, and is denoted by Q3. Note from the picture that Q3 divides the top 50% of the data into two halves, with 25% of the data points in each. Q3 is called the third quartile, since three quarters of the data points fall below it.<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"The same line as the image above, except the top 50% has been split in half at the median of all of the data in the top 50%. This median is Q3. To the left of Q3 is 25% of the data. This is between M and Q3. On the other side of Q3 is another 25% of the data. This is from Q3 to the maximum point. Together these two 25% sections make up the top 50% of the data. To the left of M is the top 50% of the data, so in total, to the left of Q3 is 25% of the data and the bottom 50% of the data, for a total of 75% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread5.gif\" alt=\"The same line as the image above, except the top 50% has been split in half at the median of all of the data in the top 50%. This median is Q3. To the left of Q3 is 25% of the data. This is between M and Q3. On the other side of Q3 is another 25% of the data. This is from Q3 to the maximum point. Together these two 25% sections make up the top 50% of the data. To the left of M is the top 50% of the data, so in total, to the left of Q3 is 25% of the data and the bottom 50% of the data, for a total of 75% of the data.\" \/><\/span><\/span><\/li>\r\n \t<li>The middle 50% of the data falls between Q1 and Q3, and therefore:\r\n<p id=\"N10B3C\">IQR = Q3 \u2013 Q1<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"A line representing the range of data. The leftmost point is the minimum point and the rightmost point is the maximum point. 25% of the line starting at the minimum point is the area to the left of Q1. To the right of Q1, going right another 25% of the line brings us to M. Going right another 25% brings us to Q3, and the last 25% brings us to the maximum point. The line segment between Q1 and Q3 is the middle 50% of the data, which is used for to calculate IQR = Q3-Q1\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread6.gif\" alt=\"A line representing the range of data. The leftmost point is the minimum point and the rightmost point is the maximum point. 25% of the line starting at the minimum point is the area to the left of Q1. To the right of Q1, going right another 25% of the line brings us to M. Going right another 25% brings us to Q3, and the last 25% brings us to the maximum point. The line segment between Q1 and Q3 is the middle 50% of the data, which is used for to calculate IQR = Q3-Q1\" \/><\/span><\/span><\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B49\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\r\n<ol>\r\n \t<li>\r\n<p id=\"N10B54\">The last picture shows that Q1, M, and Q3 divide the data into four quarters with 25% of the data points in each, where the median is essentially the second quartile. The use of IQR = Q3 \u2013 Q1 as a measure of spread is therefore particularly appropriate when the median M is used as a measure of center.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10B5A\">We can define a bit more precisely what is considered the bottom or top 50% of the data. The bottom (top) 50% of the data is all the observations whose position in the ordered list is to the left (right) of the location of the overall median M. The following picture will visually illustrate this for the simple cases of n = 7 and n = 8.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"_i_5\" class=\"img-responsive popimg aligncenter\" title=\"Two sets of dots. The first set of dots consists of 7 dots. These dots represent data points, and they are ordered so that the dots are in a line, from least to greatest. The 4th dot is the middle dot, so this is the median. The bottom 50% of the data are the 3 dots to the left of the 4th dot, and the top 50% of the data are the 3 dots to the right of the 4th dot. In the second set of dots, we have 8 dots, arranged from least to greatest. There is no middle dot, so the median M is the average of the 4th and 5th dots. The 4 dots from the 1st to 4th dot are the bottom 50% of the data, and the four dots from the 5th to 8th are the top 50% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread7.gif\" alt=\"Two sets of dots. The first set of dots consists of 7 dots. These dots represent data points, and they are ordered so that the dots are in a line, from least to greatest. The 4th dot is the middle dot, so this is the median. The bottom 50% of the data are the 3 dots to the left of the 4th dot, and the top 50% of the data are the 3 dots to the right of the 4th dot. In the second set of dots, we have 8 dots, arranged from least to greatest. There is no middle dot, so the median M is the average of the 4th and 5th dots. The 4 dots from the 1st to 4th dot are the bottom 50% of the data, and the four dots from the 5th to 8th are the top 50% of the data.\" \/><\/span><\/span>\r\n<p id=\"N10B63\">Note that when n is odd (as in n = 7 above), the median is\u00a0<em>not<\/em>\u00a0included in either the bottom or top half of the data; When n is even (as in n = 8 above), the data are naturally divided into two halves.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"example clearfix\">\r\n<h4>Best Actress Oscar Winners<\/h4>\r\n<div>\r\n<p id=\"N10B04\">To find the IQR of the Best Actress Oscar winners distribution, it will be convenient to use the stemplot.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"Stem plot of the Best Actress Oscar winners. The lower half of the step plot is the bottom half and the upper half is the top half. The stem plot is described in a stem|leaves format in row order. Note that the bottom half ends and the top half begins in the middle of a line (between two leaves). We begin with the bottom half: 2|12 2|56678999 3|012233333344 3|The top half: 3|4 3|5566789 4|1112244 4|99 5| 5| 6|112 6| 7|4 7| 8|0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot_IQR.jpg\" alt=\"Stem plot of the Best Actress Oscar winners. The lower half of the step plot is the bottom half and the upper half is the top half. The stem plot is described in a stem|leaves format in row order. Note that the bottom half ends and the top half begins in the middle of a line (between two leaves). We begin with the bottom half: 2|12 2|56678999 3|012233333344 3|The top half: 3|4 3|5566789 4|1112244 4|99 5| 5| 6|112 6| 7|4 7| 8|0\" \/><\/span><\/span>\r\n\r\nQ1 is the median of the bottom half of the data. Since there are 22 observations in that half, Q1 is the mean of the 11th and 12th ranked observations in that half:\r\n\r\n[latex]Q1=\\frac{(30+31)}{2}=30.5[\/latex]\r\n<p id=\"N10B53\">Similarly, Q3 is the median of the top half of the data, and since there are 22 observations in that half, Q3 is the mean of the 11th and 12th ranked observations in that half:<\/p>\r\n[latex]Q3=\\frac{(42+42)}{2}=42[\/latex]\r\n<p id=\"N10B99\"><span id=\"MathJax-Element-3-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-45\" class=\"mjx-math\"><span id=\"MJXc-Node-46\" class=\"mjx-mrow\"><span id=\"MJXc-Node-47\" class=\"mjx-mrow\"><span id=\"MJXc-Node-48\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span id=\"MJXc-Node-49\" class=\"mjx-mtext MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-50\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-51\" class=\"mjx-mtext MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-52\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-53\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span id=\"MJXc-Node-54\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-55\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span id=\"MJXc-Node-56\" class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span id=\"MJXc-Node-57\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-58\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><span id=\"MJXc-Node-59\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-60\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-61\" class=\"mjx-mtext MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-62\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span id=\"MJXc-Node-63\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-64\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\r\n<p id=\"N10BD6\">Note that in this example, the range covered by all the ages is 59 years, while the range covered by the middle 50% of the ages is only 11.5 years. While the whole dataset is spread over a range of 59 years, the middle 50% of the data is packed into only 11.5 years. Looking again at the histogram will illustrate this:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"Histogram of the Best Actress Oscar Winners with the Range and IQR labeled. Recall that the histogram is skewed right. While the range encompasses the entire histogram, the IQR starts at x=30.5 and ends at x=42 , which is located within area of ages with higher frequencies on the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_histogram_IQR.jpg\" alt=\"Histogram of the Best Actress Oscar Winners with the Range and IQR labeled. Recall that the histogram is skewed right. While the range encompasses the entire histogram, the IQR starts at x=30.5 and ends at x=42 , which is located within area of ages with higher frequencies on the histogram.\" width=\"750\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2>Comment<\/h2>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10BE1\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10BE8\">Software packages use different formulas to calculate the quartiles Q1 and Q3. This should not worry you, as long as you understand the idea behind these concepts. For example, here are the quartile values provided by three different software packages for the age of best actress Oscar winners:<\/p>\r\n<p id=\"N10BEB\"><em>R:<\/em><\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A snippet of output from R. It shows that: Min=21.00, Q1=32.50, Median=35, Q3=41.25, Max=80.00 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9r.gif\" alt=\"A snippet of output from R. It shows that: Min=21.00, Q1=32.50, Median=35, Q3=41.25, Max=80.00 .\" \/><\/span><\/span>\r\n<p id=\"N10BF5\"><em>Minitab:<\/em><\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A snippet of output from Minitab. It shows that N=32, Mean=38.53, Median=35.00, TrMean=36.89, StDev=12.95, SE Mean=2.29, Minimum=21.00, Maximum=80.00, Q1=31.50, Q2=41.75 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9.gif\" alt=\"A snippet of output from Minitab. It shows that N=32, Mean=38.53, Median=35.00, TrMean=36.89, StDev=12.95, SE Mean=2.29, Minimum=21.00, Maximum=80.00, Q1=31.50, Q2=41.75 .\" \/><\/span><\/span>\r\n<p id=\"N10BFF\"><em>Excel:<\/em><\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"Four cells from a Excel spreadsheet showing that Q1=32.5 and Q3=41.25 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9excel.gif\" alt=\"Four cells from a Excel spreadsheet showing that Q1=32.5 and Q3=41.25 .\" \/><\/span><\/span>\r\n<p id=\"N10C09\"><em>Note<\/em>\u00a0that Q1 and Q3 as reported by the various software packages differ from each other and are also slightly different from the ones we found here. There are different acceptable ways to find the median and the quartiles. These can give different results occasionally, especially for datasets where n (the number of observations) is fairly small. As long as you know what the numbers mean, and how to interpret them in context, it doesn\u2019t really matter much what method you use to find them, since the differences are really negligible.<\/p>\r\n\r\n<div class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Using the IQR to Detect Outliers<\/span><\/h2>\r\n<p id=\"N10B21\">So far we have quantified the idea of center, and we are in the middle of the discussion about measuring spread, but we haven\u2019t really talked about a method or rule that will help us classify extreme observations as outliers. The IQR is used as the basis for a rule of thumb for identifying outliers.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B26\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The 1.5(IQR) Criterion for Outliers<\/span><\/h2>\r\nAn observation is considered a suspected outlier if it is:\r\n<ul>\r\n \t<li>below Q1 \u2013 1.5(IQR) or<\/li>\r\n \t<li>above Q3 + 1.5(IQR)<\/li>\r\n<\/ul>\r\n<p id=\"N10B39\">The following picture illustrates this rule:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A line representing all of the data. The data is ordered so that the minimum point is the leftmost on the line and the maximum point is the rightmost. At the center of the line is M, the median, and to the left of M is Q1. Even farther to the left of Q1 is Q1-1.5(IQR). Points farther left than this are suspected outliers. To the right of M is Q3, and farther to the right is Q3+1.5(IQR). Points even farther than this are also suspected outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread10.gif\" alt=\"A line representing all of the data. The data is ordered so that the minimum point is the leftmost on the line and the maximum point is the rightmost. At the center of the line is M, the median, and to the left of M is Q1. Even farther to the left of Q1 is Q1-1.5(IQR). Points farther left than this are suspected outliers. To the right of M is Q3, and farther to the right is Q3+1.5(IQR). Points even farther than this are also suspected outliers.\" \/><\/span><\/span>\r\n<div class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"N10B47\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.<\/p>\r\n\r\n<table class=\"formula\">\r\n<tbody>\r\n<tr>\r\n<td>34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n\r\nRecall that when we first looked at the histogram of ages of Best Actress Oscar winners, there were 5 observations that looked like possible outliers:\r\n\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A histogram of the Oscar winners in which for x=62 the frequency is 3 and for x=74 and x=80, the frequency is 1. Those points are thought to be possible outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_histogram_outliers.jpg\" alt=\"A histogram of the Oscar winners in which for x=62 the frequency is 3 and for x=74 and x=80, the frequency is 1. Those points are thought to be possible outliers.\" width=\"800\" \/><\/span><\/span>\r\n<p id=\"N10B64\">We can now use the 1.5(IQR) criterion to check whether the 5 observations should indeed be classified as outliers:<\/p>\r\n\r\n<ul>\r\n \t<li>For this example we found that\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span class=\"mjx-mi MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">and<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">3<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u21d2<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><\/span><\/span><\/li>\r\n \t<li><span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-66\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">13<\/span><\/span><span id=\"MJXc-Node-67\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-68\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">25<\/span><\/span><\/span><\/span><\/span><\/li>\r\n \t<li><span class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-69\" class=\"mjx-math\"><span id=\"MJXc-Node-70\" class=\"mjx-mrow\"><span id=\"MJXc-Node-71\" class=\"mjx-mrow\"><span id=\"MJXc-Node-72\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span id=\"MJXc-Node-73\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">3<\/span><\/span><span id=\"MJXc-Node-75\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">+<\/span><\/span><span id=\"MJXc-Node-77\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span id=\"MJXc-Node-78\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-79\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span id=\"MJXc-Node-80\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-81\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-82\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span id=\"MJXc-Node-83\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-85\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-87\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span id=\"MJXc-Node-88\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-89\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span id=\"MJXc-Node-91\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">+<\/span><\/span><span id=\"MJXc-Node-93\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-94\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-95\" class=\"mjx-mrow\"><span id=\"MJXc-Node-96\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span id=\"MJXc-Node-97\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-98\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><span id=\"MJXc-Node-99\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-100\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-101\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-102\" class=\"mjx-mrow\"><span id=\"MJXc-Node-103\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span id=\"MJXc-Node-104\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-105\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><span id=\"MJXc-Node-106\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-108\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-110\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">59<\/span><\/span><span id=\"MJXc-Node-111\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-112\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">25<\/span><\/span><\/span><\/span><\/span><\/span><\/li>\r\n<\/ul>\r\n<p id=\"N10CA5\">The 1.5(IQR) criterion tells us that any observation that is below 13.25 or above 59.25 is considered a suspected outlier.<\/p>\r\n<p id=\"N10CA8\">We therefore conclude that the observations 61, 61, 62, 74 and 80 should be flagged as suspected outliers in the distribution of ages. Note that since the smallest observation is 21, there are no suspected low outliers in this distribution.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"25\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10B03\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Understanding Outliers<\/span><\/h2>\r\n<p id=\"N10B0A\">We just practiced one way to \u2018flag\u2019 possible outliers. Why is it important to identify possible outliers, and how should they be dealt with? The answers to these questions depend on the reasons for the outlying values. Here are several possibilities:<\/p>\r\n\r\n<ol>\r\n \t<li>Even though it is an extreme value, if an outlier can be understood to have been produced by\u00a0<em>essentially the same sort of physical or biological process<\/em>\u00a0as the rest of the data, and if such extreme values are expected to\u00a0<em>eventually occur again<\/em>, then such an outlier indicates something important and interesting about the process you\u2019re investigating, and it\u00a0<em>should be kept<\/em>\u00a0in the data.<\/li>\r\n \t<li>\r\n<p id=\"N10B1E\">If an outlier can be explained to have been produced under fundamentally\u00a0<em>different<\/em>\u00a0conditions from the rest of the data (or by a fundamentally different process), such an outlier\u00a0<em>can be removed<\/em>\u00a0from the data if your goal is to investigate only the process that produced the rest of the data.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10B29\">An outlier might indicate a\u00a0<em>mistake<\/em>\u00a0in the data (like a typo, or a measuring error), in which case it\u00a0<em>should be corrected if possible or else removed<\/em>\u00a0from the data before calculating summary statistics or making inferences from the data (and the reason for the mistake should be investigated).<\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"N10B33\"><em>Here are examples of each of these types of outliers:<\/em><\/p>\r\n\r\n<ol>\r\n \t<li>\r\n<p id=\"N10B3B\">The following histogram displays the magnitude of 460 earthquakes in California, occurring in the year 2000, between August 28 and September 9:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;California Earthquakes, Aug 28,2000 - Sep 9,2000&quot;. The histogram is skewed-right. Frequency on the Y-axis ranges from 0 to 90, and on the X-axis is Magnitude in Richter units, from 0 to 5.4 . As we go from left to right across the X-axis, the frequency increases to the mode at x=1.2, y=90, then it decreases to 0 after x=3.6. However, beyond 4.8, we see a small bar representing a frequency of 1.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread21.gif\" alt=\"A histogram titled &quot;California Earthquakes, Aug 28,2000 - Sep 9,2000&quot;. The histogram is skewed-right. Frequency on the Y-axis ranges from 0 to 90, and on the X-axis is Magnitude in Richter units, from 0 to 5.4 . As we go from left to right across the X-axis, the frequency increases to the mode at x=1.2, y=90, then it decreases to 0 after x=3.6. However, beyond 4.8, we see a small bar representing a frequency of 1.\" \/><\/span><\/span>\r\n<p id=\"N10B44\"><em>Identifying the outlier:<\/em><\/p>\r\n<p id=\"N10B48\">On the very far right edge of the display (beyond 4.8), we see a low bar; this represents one earthquake (because the bar has height of 1) that was much more severe than the others in the data.<\/p>\r\n<p id=\"N10B4B\"><em>Understanding the outlier:<\/em><\/p>\r\n<p id=\"N10B4F\">In this case, the outlier represents a much stronger earthquake, which is relatively rarer than the smaller quakes that happen more frequently in California.<\/p>\r\n<p id=\"N10B52\"><em>How to handle the outlier:<\/em><\/p>\r\nFor many purposes, the relatively severe quakes represented by the outlier might be the most important (because, for instance, that sort of quake has the potential to do more damage to people and infrastructure). The smaller-magnitude quakes might not do any damage, or even be felt at all. So, for many purposes it could be important to keep this outlier in the data.<\/li>\r\n \t<li>The following histogram displays the monthly percent return on the stock of Phillip Morris (a large tobacco company) from July 1990 to May 1997:<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;Phillip Morris Monthly Stock Return, July 1990 - May 1997. On the Y-axis is Frequency, from 0 to 30. On the X-axis is Monthy Stock Return in percent. It ranges from -30 to 20. The histogram is skewed-left. At the very left, between at the interval x=(-30, -25), a bar indicating frequency of 1 appears. Then, we see no bar until x=-15, where there is a bar of frequency 5. As we continue moving right along the x-axis, frequency increases to the mode of 30 at the interval x=(0,5), and then decreases, until reaching a frequency of 5 at the interval x=(15,20).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread22.gif\" alt=\"A histogram titled &quot;Phillip Morris Monthly Stock Return, July 1990 - May 1997. On the Y-axis is Frequency, from 0 to 30. On the X-axis is Monthy Stock Return in percent. It ranges from -30 to 20. The histogram is skewed-left. At the very left, between at the interval x=(-30, -25), a bar indicating frequency of 1 appears. Then, we see no bar until x=-15, where there is a bar of frequency 5. As we continue moving right along the x-axis, frequency increases to the mode of 30 at the interval x=(0,5), and then decreases, until reaching a frequency of 5 at the interval x=(15,20).\" \/><\/span><\/span>\r\n<p id=\"N10B67\"><em>Identifying the outlier:<\/em><\/p>\r\n<p id=\"N10B6C\">On the display, we see a low bar far to the left of the others; this represents one month\u2019s return (because the bar has height of 1), where the value of Phillip Morris stock was unusually low.<\/p>\r\n<em>Understanding the outlier:<\/em>\r\n<p id=\"N10B73\">The explanation for this particular outlier is that, in the early 1990s, there were highly-publicized federal hearings being conducted regarding the addictiveness of smoking, and there was growing public sentiment against the tobacco companies. The unusually low monthly value in the Phillip Morris dataset was due to public pressure against smoking, which negatively affected the company\u2019s stock for that particular month.<\/p>\r\n<p id=\"N10B76\"><em>How to handle the outlier:<\/em><\/p>\r\n<p id=\"N10B7A\">In this case, the outlier was due to unusual conditions during one particular month that aren\u2019t expected to be repeated, and that were fundamentally different from the conditions that produced the values in all the other months. So in this case, it would be reasonable to remove the outlier, if we wanted to characterize the \u2018typical\u2019 monthly return on Phillip Morris stock.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10B7F\">When archaeologists dig up objects such as pieces of ancient pottery, chemical analysis can be performed on the artifacts. The chemical content of pottery can vary depending on the type of clay as well as the particular manufacturing technique. The following histogram displays the results of one such actual chemical analysis, performed on 48 ancient Roman pottery artifacts from archaeological sites in Britain:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;Manganous Oxide Content in a sample of Ancient Roman Pottery&quot;. The X-axis is labeled &quot;number of pottery shards&quot;, and ranges from 0 to 20. The Y-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.0 to 0.4 . The histogram is skewed-right. Here are the bars: x=0.0,y=10; x=0.05,y=13; x=0.1,y=18; x=0.15,y=5; x=0.20,y=1; x=0.4,y=1. Note that there are no shards for x=0.25 to x=0.35\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread23.gif\" alt=\"A histogram titled &quot;Manganous Oxide Content in a sample of Ancient Roman Pottery&quot;. The X-axis is labeled &quot;number of pottery shards&quot;, and ranges from 0 to 20. The Y-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.0 to 0.4 . The histogram is skewed-right. Here are the bars: x=0.0,y=10; x=0.05,y=13; x=0.1,y=18; x=0.15,y=5; x=0.20,y=1; x=0.4,y=1. Note that there are no shards for x=0.25 to x=0.35\" \/><\/span><\/span>\r\n<p id=\"N10B94\"><em>Identifying the outlier:<\/em><\/p>\r\n<p id=\"N10B98\">On the display, we see a low bar far to the right of the others; this represents one piece of pottery (because the bar has a height of 1), which has a suspiciously high manganous oxide value.<\/p>\r\n<p id=\"N10B9B\"><em>Understanding the outlier:<\/em><\/p>\r\n<p id=\"N10BA0\">Based on comparison with other pieces of pottery found at the same site, and based on expert understanding of the typical content of this particular compound, it was concluded that the unusually high value was most likely a typo that was made when the data were published in the original 1980 paper (it was typed as \u201c.394\u201d but it was probably meant to be \u201c.094\u201d).<\/p>\r\n<p id=\"N10BA3\"><em>How to handle the outlier:<\/em><\/p>\r\n<p id=\"N10BA7\">In this case, since the outlier was judged to be a mistake, it should be removed from the data before further analysis. In fact, removing the outlier is useful not only because it\u2019s a mistake, but also because doing so reveals important structure that was otherwise hidden. This feature is evident on the next display:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot; Histogram without the outlier&quot; The Y-axis is labeled &quot;number of pottery shards&quot;, and it ranges from 0 to 12. The X-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.00 to about 0.18. Going from left to right along the X-axis reveals that at x=0, there is a frequency of 10. Then, there are no bars until x=0.4 . From here the bars increase in height until x=0.08, where the frequency is 12. Then the bars begin to decrease.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread24.gif\" alt=\"A histogram titled &quot; Histogram without the outlier&quot; The Y-axis is labeled &quot;number of pottery shards&quot;, and it ranges from 0 to 12. The X-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.00 to about 0.18. Going from left to right along the X-axis reveals that at x=0, there is a frequency of 10. Then, there are no bars until x=0.4 . From here the bars increase in height until x=0.08, where the frequency is 12. Then the bars begin to decrease.\" \/><\/span><\/span>\r\n<p id=\"N10BB0\">When the outlier is removed, the display is re-scaled so that now we can see the set of 10 pottery pieces that had almost no manganous oxide. These 10 pieces might have been made with a different potting technique, so identifying them as different from the rest is historically useful. This feature was only evident after the outlier was removed.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n<div id=\"N10BB7\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<ul>\r\n \t<li>\r\n<p id=\"N10BC1\">The range covered by the data is the most intuitive measure of spread and is exactly the distance between the smallest data point (min) and the largest one (Max).<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10BC5\">Another measure of spread is the inter-quartile range (IQR), which is the range covered by the middle 50% of the data.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10BC9\">IQR = Q3 \u2013 Q1, the difference between the third and first quartiles. The first quartile (Q1) is the value such that one quarter (25%) of the data points fall below it, or the median of the bottom half of the data. The third quartile is the value such that three quarters (75%) of the data points fall below it, or the median of the top half of the data.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10BCD\">The IQR should be used as a measure of spread of a distribution only when the median is used as a measure of center.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"N10BD1\">The IQR can be used to detect outliers using the 1.5(IQR) criterion. Outliers are observations that fall below Q1 \u2013 1.5(IQR) or above Q3 + 1.5(IQR).<\/p>\r\n<\/li>\r\n<\/ul>\r\n\r\n<hr \/>\r\n\r\n<div class=\"\">\r\n<h2>Introduction<\/h2>\r\n<\/div>\r\n<div id=\"f37a2e56ba69443aafc171c71d68109d\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"a833565489bd452ea43d004ce5199037\">Before we move on to the third measure of spread (standard deviation), we\u2019ll summarize what we\u2019ve learned so far about measuring spread and use it to introduce another graphical display of the distribution of a quantitative variable, the\u00a0<em class=\"italic\">boxplot<\/em>.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c061abd640bd43669d21d9221d16e960\" class=\"section purposewrap\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Five Number Summary<\/span><\/h2>\r\n<p id=\"c67c291de21942e9bcad953a08e0eefd\">So far, in our discussion about measures of spread, the key players were:<\/p>\r\n\r\n<ul id=\"df7873c703854e76a808c66ac50420be\">\r\n \t<li>\r\n<p id=\"aa1c4b4349a804d1fb1bf166dc3cba834\">the extremes (min and Max), which provide the range covered by all the data; and<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"aec5f5eb73ef7408684e3a28c0c08d6a5\">the quartiles (Q1, M and Q3), which together provide the IQR, the range covered by the middle 50% of the data.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"c9a84bdd621e44999145a969a4e95f44\">The combination of all five numbers (min, Q1, M, Q3, Max) is called the\u00a0<em class=\"italic\">five number summary<\/em>, and provides a quick numerical description of both the center and spread of a distribution.<\/p>\r\n\r\n<div id=\"c341b139c7ed4e3fa1483f04434971b1\" class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"example clearfix\">\r\n<h4>Best Actress Oscar Winners<\/h4>\r\n<div>\r\n<p id=\"f6bb78fcde9942e1bd775b3eff365469\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a><\/p>\r\n<p id=\"f8547fe8afc246349132d5ea4bc63e96\">34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44 The five number summary of the age of Best Actress Oscar winners (1970-2013) is:<\/p>\r\n<p id=\"af62c61f2b4c45c5a546e2904fe21aad\">Min: 21<\/p>\r\n<p id=\"d24b34d6e772487e8ec564858d7b1af4\">Q1: 30.5<\/p>\r\n<p id=\"ba2540d172cf402aabe3b3136bfd490d\">M: 34.5<\/p>\r\n<p id=\"f66a04548b9b4dfdbc01d83fa6d5e2db\">Q3: 42<\/p>\r\n<p id=\"c27406bdd1ad42c8a0c48e99e99922d3\">Max: 80<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"de1ee0a38af5437eafc3d8dfa466c5f6\">Now that you understand what each of the five numbers means, you can appreciate how much information about the distribution is packed into the five-number summary. All this information can also be represented visually by using the boxplot.<\/p>\r\n\r\n<div id=\"e17411323cd445e39c706e122a9d1c7b\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">The Boxplot<\/span><\/h2>\r\n<p id=\"f35182ed8ab24147a2f45500437d9040\">The boxplot graphically represents the distribution of a quantitative variable by visually displaying the five-number summary and any observation that was classified as a suspected outlier using the 1.5(IQR) criterion.<\/p>\r\n<p id=\"dbfae605efad4fe8b00068c4e26c7b2a\">There are several ways to plot the whiskers on a boxplot. One convention is to plot whiskers down to the minimum and up to the maximum value. We use the 1.5(IQR criterion), also known as the Tukey method for plotting whiskers. First, calculate the IQR, the difference between the 75th and 25th percentiles (or Q3 \u2013 Q1). Multiply the IQR by 1.5. Add this value to the 75th percentile. If the value is greater than (or equal to) the maximum value in the dataset, draw the upper whisker to the maximum value. Otherwise, stop the whisker at the largest value that is less than 75th percentile + 1.5 * IQR. Plot any values that are greater than this as individual points that are outliers. Similarly, subtract 1.5 * IQR from the 25th percentile. If this value is smaller than the minimum value in the dataset, draw the lower whisker to the minimum value. Otherwise, stop the whisker at the lowest value that is greater than 25th percentile \u2013 1.5 * IQR. Plot any values that are smaller than this as individual points that are outliers.<\/p>\r\n<p id=\"b12574f52c714dab81f422e9f0c29ad4\">Using the Best Actress dataset, here is how we determine where to draw the whiskers:<\/p>\r\n\r\n<ul id=\"e5ab7047a2a94d1b90b2381a5c085855\">\r\n \t<li>\r\n<p id=\"aa222eb04661a447ea56993e9a16b4b9b\">Q3 = 42<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ab637fbdd00ce424ca123bfd19442cbc8\">Q1 = 30.5<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ae0910e473f8c437891179e493a655a85\">IQR: 42 \u2013 30.5 = 11.5<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ae2517c9def39461c81df9c1f464db5d8\">1.5 * IQR = 1.5 * 11.5 = 17.25<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af9f5c7d6238647fdaf29e4546bfa380f\">Q3 + 1.5 * IQR = 42 + 17.25 = 59.25<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"d0bfde4564da45e794d6e92fba03f3a6\">The largest observation that is less than or equal to 59.25 is 49 so we draw the upper whisker up to 49. All points above 49 are considered outliers (61, 61, 62, 74, 80).<\/p>\r\n<p id=\"abf6ac0bf9ec4cc68a488892e9754f43\">Q1 \u2013 1.5 * IQR = 30.5 \u2013 17.25 = 13.25<\/p>\r\n<p id=\"a55409c343f54a17a0f1c3b3d27792f5\">The smallest observation that is greater than or equal to 13.5 is 21 so we draw the lower whisker down to 21, which is also the minimum. There are no outliers.<\/p>\r\n<p id=\"bedca7b7005745f2a592b31b7e390d49\">Here is how a boxplot is constructed: (this is for the \u201cBest Actress\u201d dataset\u2014 to see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013<\/a>)<\/p>\r\n&nbsp;\r\n\r\nhttps:\/\/youtube.com\/watch?v=S50-WYpOm4I\r\n<div class=\"figurewrap\">\r\n<div class=\"figure clearfix\">\r\n<div id=\"uwrap_f749d16feff4406789e1cca5c06754d5\" class=\"youtube\">\r\n\r\n<span style=\"text-align: initial; font-size: 1em;\">GeoGebra Group offers a simulation activity where you can practice calculating the median, Q1, Q3, IQR, and outliers and drawing a boxplot. Note that you can edit the data in the chart to see different results.<\/span>\r\n\r\n<\/div>\r\n<div class=\"captionwrap\">\r\n<p id=\"de88a844a7ba4781960528e6323bd2ad\">To view this interactive simulation in a separate window click\u00a0<a href=\"http:\/\/ggbtu.be\/m11008\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\r\n<a href=\"https:\/\/www.geogebra.org\/m\/KhKTscBY\" target=\"_blank\" rel=\"noopener\">https:\/\/www.geogebra.org\/m\/KhKTscBY<\/a>\r\n<p id=\"e14345eab62040a98e124bac370b72c4\"><a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\" target=\"_blank\" rel=\"noopener\">CC BY-SA 3.0<\/a>\u00a0by\u00a0<a href=\"http:\/\/www.geogebra.org\/\" target=\"_blank\" rel=\"noopener\">GeoGebra Group<\/a><\/p>\r\n\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div>\r\n<div class=\"textbox__content\">\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the top box\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_q3.gif\" alt=\"boxplot graph with question mark next to the top box\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"26\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the top line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_none.gif\" alt=\"boxplot graph with question mark next to the top line\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"27\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to an star above the top line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_max.gif\" alt=\"boxplot graph with question mark next to an star above the top line\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"28\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the bottom line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_min.gif\" alt=\"boxplot graph with question mark next to the bottom line\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"29\"]\r\n\r\n&nbsp;\r\n\r\n<img class=\"aligncenter\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_q1.gif\" alt=\"boxplot graph with question mark next to the bottom box\" width=\"500\" height=\"374\" \/>\r\n\r\n[h5p id=\"30\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the middle line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_m.gif\" alt=\"boxplot graph with question mark next to the middle line\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"31\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to red arrow going from bottom line to top star\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_range.gif\" alt=\"boxplot graph with question mark next to red arrow going from bottom line to top star\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"32\"]\r\n\r\n&nbsp;\r\n\r\n<img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to arrow marking the height of both boxes\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_iqr.gif\" alt=\"boxplot graph with question mark next to arrow marking the height of both boxes\" width=\"450\" height=\"450\" \/>\r\n\r\n[h5p id=\"33\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div><\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"b5ec5b17efd54f1bb3e6affc3b0b7d5f\">GeoGebra Group offers a simulation activity where you can practice calculating the median, Q1, Q3, IQR, and outliers and drawing a boxplot. Note that you can edit the data in the chart to see different results.<\/p>\r\n<p id=\"de88a844a7ba4781960528e6323bd2ad\">To view this interactive simulation in a separate window click\u00a0<a id=\"_i_0\" href=\"http:\/\/ggbtu.be\/m11008\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\r\n<a id=\"_i_1\" href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\" target=\"_blank\" rel=\"noopener\">CC BY-SA 3.0<\/a>\u00a0by\u00a0<a id=\"_i_2\" href=\"http:\/\/www.geogebra.org\/\" target=\"_blank\" rel=\"noopener\">GeoGebra Group<\/a>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N1007C\">The boxplot below displays ratings for TV shows during sweeps week:<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph for tv show ratings\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/boxplot10.gif\" alt=\"boxplot graph for tv show ratings\" \/><\/div>\r\n<div>[h5p id=\"34\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"cf7fe0ff5d2540f9927123f2863ca048\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span title=\"Quick scroll up\">Side-By-Side (Comparative) Boxplots<\/span><\/h2>\r\n<p id=\"a4f537f74bff461aaf1dab3ae8a74c6c\">As we learned in the beginning of this module, the distribution of a quantitative variable is best represented graphically by a histogram. Boxplots are most useful when presented side-by-side for comparing and contrasting distributions from two or more groups.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"ca9b36bab98e4036954a591969315be9\" class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Best Actor\/Actress Oscar Winners<\/h4>\r\n<div class=\"example clearfix\">\r\n<p id=\"a7dbed46f0de4d0a94e5b6ea9dcfd556\">So far we have examined the age distributions of Oscar winners for males and females separately.<\/p>\r\n<p id=\"fe9af4e22afa4ad2b0a5b7d9b74a2c35\">It will be interesting to\u00a0<em class=\"italic\">compare<\/em>\u00a0the age distributions of actors and actresses who won best acting Oscars. To do that we will look at side-by-side boxplots of the age distributions by gender. Recall also that we found the five-number summary and means for both distributions. For the Best Actress dataset, we did the calculations by hand. For the Best Actor dataset, we used statistical software, and here are the results we got:<\/p>\r\n\r\n<div class=\"Excel2019PC altContentOn\">\r\n<div class=\"alternative\">\r\n<ul id=\"b9b69db81c134c27bec90742ee3860cd\">\r\n \t<li>\r\n<p id=\"de6e4a27a2144c5a87cab2e99eabe8ea\">Actors: min = 31, Q1 = 38, M = 43.5, Q3 = 50.5, Max = 76<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"f264b0f3cfdb4c2aaacdb1326dd3369b\">Actresses: min = 21, Q1 = 30.5, M = 34.5, Q3 = 42, Max = 80<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"bff1026cf49241c4b47636b7ce64753e\">Based on the graph and numerical measures, we can make the following comparison between the two distributions:<\/p>\r\n<p id=\"b7d00b85cece48a6b97bf3415303556b\"><em class=\"bold\">Center:<\/em>\u00a0The graph reveals that the age distribution of the males is higher than the females\u2019 age distribution. This is supported by the numerical measures. The median age for females (34.5) is lower than for the males (43.5). Actually, it should be noted that even the third quartile of the females\u2019 distribution (42) is lower than the median age for males. We therefore conclude that in general, actresses win the Best Actress Oscar at a younger age than actors do.<\/p>\r\n<p id=\"eb69d02be1514d1da883f699fa140014\"><em class=\"bold\">Spread:<\/em>\u00a0Judging by the range of the data, there is much more variability in the females\u2019 distribution (range = 59) than there is in the males\u2019 distribution (range = 47). On the other hand, if we look at the IQR, which measures the variability only among the middle 50% of the distribution, we see slightly more spread in the ages of males (IQR = 12.5) than females (IQR = 11.5). We conclude that among all the winners, the actors\u2019 ages are more alike than the actresses\u2019 ages. However, the middle 50% of the age distribution of actresses is more homogeneous than the actors\u2019 age distribution.<\/p>\r\n<p id=\"ea420af3ae6743bda47a0013d7f1b1d9\"><em class=\"bold\">Outliers:<\/em>\u00a0We see that we have outliers in both distributions. There is only one high outlier in the actors\u2019 distribution (76, Henry Fonda,\u00a0<em class=\"italic\">On Golden Pond<\/em>), compared with three high outliers in the actresses\u2019 distribution.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div id=\"bc6c2f8cb179418e8b7f2f18af190b45\" class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Temperature of Pittsburgh vs. San Francisco<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"addd9131226a4b9fa9b98f4a4071b080\">In order to compare the average high temperatures of Pittsburgh to those in San Francisco we will look at the following side-by-side boxplots, and supplement the graph with the descriptive statistics of each of the two distributions.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b4efb570be4b4c21a236493cd8951c27\" class=\"img-responsive popimg\" title=\"A box plot titled &amp;quot;Average High Temperature: San Francisco vs. Pittsburgh&amp;quot;. The vertical axis is in units of Temperature (F), and it goes from 30-80. There are two box plots, one for Pittsburgh and one for San Francisco.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/boxplot8.gif\" alt=\"A box plot titled &amp;quot;Average High Temperature: San Francisco vs. Pittsburgh&amp;quot;. The vertical axis is in units of Temperature (F), and it goes from 30-80. There are two box plots, one for Pittsburgh and one for San Francisco.\" \/><\/span><\/span>\r\n<table id=\"fa637b2947c143bf9b6ec8a4116bb0bd_bx\" class=\"table labeled\">\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\"><\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table id=\"fa637b2947c143bf9b6ec8a4116bb0bd\" class=\"wbtable plain\">\r\n<thead>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"abe5c26b236a64c92ab6fda92bd392400\">Statistic<\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ae3c42c26cbf84afab9ee7164b4a9bf44\">San Francisco<\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa49bd1d25158497887686a8b78fe1793\">Pittsburgh<\/p>\r\n<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aee6dfb08935c4374b37763cf4ef5bfea\">min<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"af2972a1e3f6042af959296958f93bd99\">56.3<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aaa8a50a5ec784f448a50e70ede0a5dbf\">33.7<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"abb71db9f5fe54afa966be95b12436084\">Q1<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab97c99caf6c6411b92e5b6980a507161\">60.2<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ad22ae158d2c24313bbea13e3a64929d2\">41.2<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa590a14a3fee4d9b8920737af53ff36c\">Median<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"af1e01b5d49314ebc9b9f654322ec61ea\">62.7<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aaf85996c37f44c64b764a0b99402a3bd\">61.4<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ae2135487985f44bbbcaea05a6a62a4df\">Q3<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab903ec3c16a54101bb10f1808c3a147d\">65.35<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab57b364a0f78478aac015300202fc5cc\">77.75<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ac211a0c29c4746af8d177e561236a1a7\">Max<\/p>\r\n<\/th>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aacc9893314164721ae6c3d924bcb411f\">68.7<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aedd79e0c13de44f1ad84aa65538216a1\">82.6<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"b8e22ce18a4f47bdae55d024611b2b3f\">When looking at the graph, the similarities and differences between the two distributions are striking. Both distributions have roughly the same center (medians are 61.4 for Pitt, and 62.7 for San Francisco). However, the temperatures in Pittsburgh have a much larger variability than the temperatures in San Francisco (Range: 49 vs. 12. IQR: 36.5 vs. 5).<\/p>\r\n<p id=\"aa0c792c85ae457cb8e0b3c621154eb8\">The practical interpretation of the results we got is that the weather in San Francisco is much more consistent than the weather in Pittsburgh, which varies a lot during the year. Also, because the temperatures in San Francisco vary so little during the year, knowing that the median temperature is around 63 is actually very informative. On the other hand, knowing that the median temperature in Pittsburgh is around 61 is practically useless, since temperatures vary so much during the year, and can get much warmer or much colder.<\/p>\r\n<p id=\"f44f0ee8953441d3a84a160a1ef8a69e\">Note that this example provides more intuition about variability by interpreting small variability as consistency, and large variability as lack of consistency. Also, through this example we learned that the center of the distribution is more meaningful as a typical value for the distribution when there is little variability (or, as statisticians say, little \u201cnoise\u201d) around it. When there is large variability, the center loses its practical meaning as a typical value.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<ul id=\"df69ce4741264998a2c991342adb74e8\">\r\n \t<li>\r\n<p id=\"ae0f5c9c45ca54e92b22a07d77187ab90\">The five-number summary of a distribution consists of the median (M), the two quartiles (Q1, Q3) and the extremes (min, Max).<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af35b33ca7cc949769a97448e32a71f95\">The five-number summary provides a complete numerical description of a distribution. The median describes the center, and the extremes (which give the range) and the quartiles (which give the IQR) describe the spread.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ab47fb74b3d4b47b8993a0ee72c397d04\">The boxplot graphically represents the distribution of a quantitative variable by visually displaying the five number summary and any observation that was classified as a suspected outlier using the 1.5 (IQR) criterion.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"adb8edfab278c44c1859910f4e1bb2a19\">Boxplots are most useful when presented side-by-side to compare and contrast distributions from two or more groups.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<div id=\"lobjh\" class=\"\">\n<h2>Introduction<\/h2>\n<\/div>\n<div id=\"N10B06\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10B0D\">So far we have learned about different ways to quantify the center of a distribution. A measure of center by itself is not enough, though, to describe a distribution. Consider the following two distributions of exam scores. Both distributions are centered at 70 (the median of both distributions is approximately 70), but the distributions are quite different. The first distribution has a much larger variability in scores compared to the second one.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_0\" class=\"img-responsive popimg aligncenter\" title=\"Two dot plots of exam scores. The first plot has a median of approximately 70, but there are scores from below 50 to above 90. In the second dot plot, the median is once again about 70, but this time the range of scores is from about 60 to about 80.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread1.gif\" alt=\"Two dot plots of exam scores. The first plot has a median of approximately 70, but there are scores from below 50 to above 90. In the second dot plot, the median is once again about 70, but this time the range of scores is from about 60 to about 80.\" \/><\/span><\/span><\/p>\n<p id=\"N10B16\">In order to describe the distribution, we therefore need to supplement the graphical display not only with a measure of center, but also with a measure of the variability (or spread) of the distribution.<\/p>\n<p id=\"N10B19\">In this section, we will discuss the three most commonly used measures of spread:<\/p>\n<ul>\n<li>Range<\/li>\n<li>Inter-quartile range (IQR)<\/li>\n<li>Standard deviation<\/li>\n<\/ul>\n<p id=\"N10B28\">Like the different measures of center, these measures provide different ways to quantify the variability of the distribution.<\/p>\n<\/div>\n<\/div>\n<div id=\"N10B2D\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Range<\/span><\/h2>\n<p id=\"N10B34\">The\u00a0<em>range<\/em>\u00a0covered by the data is the most intuitive measure of variability. The range is exactly the distance between the smallest data point (min) and the largest one (Max).<\/p>\n<ul>\n<li>Range = Max \u2013 min<\/li>\n<\/ul>\n<p id=\"N10B40\">Note: When we first looked at the histogram, and tried to get a first feel for the spread of the data, we were actually\u00a0<em class=\"italic\">approximating<\/em>\u00a0the range, rather than calculating the exact range.<\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"example clearfix\">Best Actress Oscar Winners<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B4C\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a><\/p>\n<table class=\"formula\">\n<tbody>\n<tr>\n<td>34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"N10B5B\">In this example:<\/p>\n<ul>\n<li>min = 21 (Marlee Matlin for\u00a0<em class=\"italic\">Children of a Lesser God<\/em>, 1986)<\/li>\n<li>Max = 80 (Jessica Tandy for\u00a0<em class=\"italic\">Driving Miss Daisy<\/em>, 1989)<\/li>\n<\/ul>\n<p id=\"N10B6F\">The range covered by all the data is 80 \u2013 21 = 59 years.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10AFF\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Inter-Quartile Range (IQR)<\/span><\/h2>\n<p>While the range quantifies the variability by looking at the range covered by\u00a0<em class=\"italic\">ALL<\/em>\u00a0the data, the IQR measures the variability of a distribution by giving us the range covered by the\u00a0<em class=\"italic\">MIDDLE 50%<\/em>\u00a0of the data.<\/p>\n<p id=\"N10B11\">The following picture illustrates this idea: (Think about the horizontal line as the data ranging from the min to the Max).<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A horizontal line representing all of the data. The entire line represents the range of the data, and the leftmost point is the minimum data point. The rightmost point is the maximum data point. 25% of the range spanning the area between the leftmost point and 1\/4 of the line from the leftmost point is labeled the Bottom 25% of the data. The area from the 1\/4 point to the 3\/4 point is labeled the middle 50% of the data. This is where the IQR is calculated. Indeed, the middle 50% represents half of the line. The rest of the line, the remaining 1\/4 from the 3\/4 point to the rightmost point, is the top 25% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread2.gif\" alt=\"A horizontal line representing all of the data. The entire line represents the range of the data, and the leftmost point is the minimum data point. The rightmost point is the maximum data point. 25% of the range spanning the area between the leftmost point and 1\/4 of the line from the leftmost point is labeled the Bottom 25% of the data. The area from the 1\/4 point to the 3\/4 point is labeled the middle 50% of the data. This is where the IQR is calculated. Indeed, the middle 50% represents half of the line. The rest of the line, the remaining 1\/4 from the 3\/4 point to the rightmost point, is the top 25% of the data.\" \/><\/span><\/span><\/p>\n<p id=\"N10B1A\">Here is how the IQR is actually found:<\/p>\n<ol>\n<li>Arrange the data in increasing order, and find the median M. Recall that the median divides the data, so that 50% of the data points are below the median, and 50% of the data points are above the median.<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_1\" class=\"img-responsive popimg aligncenter\" title=\"A line representing the range of the data. Once again, the leftmost point is the minimum, and the rightmost point is the maximum. At the middle is M, the median. All of the line to the left of M is the bottom 50% of the data, and all of the line to the right of M is the top 50% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread3.gif\" alt=\"A line representing the range of the data. Once again, the leftmost point is the minimum, and the rightmost point is the maximum. At the middle is M, the median. All of the line to the left of M is the bottom 50% of the data, and all of the line to the right of M is the top 50% of the data.\" \/><\/span><\/span><\/li>\n<li>Find the median of the lower 50% of the data. This is called the first quartile of the distribution, and the point is denoted by Q1. Note from the picture that Q1 divides the lower 50% of the data into two halves, containing 25% of the data points in each half. Q1 is called the first quartile, since one quarter of the data points fall below it.<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_2\" class=\"img-responsive popimg aligncenter\" title=\"The same line as the image above, except the bottom 50% has been split in half at the median of all of the data in the bottom 50%. This median is Q1. To the left of Q1 is 25% of the data. This is between the minimum point and Q1. On the other side of Q1 is another 25% of the data. This is from Q1 to M. Together these two 25% sections make up the bottom 50% of the data. To the right of M is the top 50% of the data, so in total, to the right of Q1 is 25% of the data and the top 50% of the data, for a total of 75% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread4.gif\" alt=\"The same line as the image above, except the bottom 50% has been split in half at the median of all of the data in the bottom 50%. This median is Q1. To the left of Q1 is 25% of the data. This is between the minimum point and Q1. On the other side of Q1 is another 25% of the data. This is from Q1 to M. Together these two 25% sections make up the bottom 50% of the data. To the right of M is the top 50% of the data, so in total, to the right of Q1 is 25% of the data and the top 50% of the data, for a total of 75% of the data.\" \/><\/span><\/span><\/li>\n<li>Repeat this again for the top 50% of the data. Find the median of the top 50% of the data. This point is called the third quartile of the distribution, and is denoted by Q3. Note from the picture that Q3 divides the top 50% of the data into two halves, with 25% of the data points in each. Q3 is called the third quartile, since three quarters of the data points fall below it.<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_3\" class=\"img-responsive popimg aligncenter\" title=\"The same line as the image above, except the top 50% has been split in half at the median of all of the data in the top 50%. This median is Q3. To the left of Q3 is 25% of the data. This is between M and Q3. On the other side of Q3 is another 25% of the data. This is from Q3 to the maximum point. Together these two 25% sections make up the top 50% of the data. To the left of M is the top 50% of the data, so in total, to the left of Q3 is 25% of the data and the bottom 50% of the data, for a total of 75% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread5.gif\" alt=\"The same line as the image above, except the top 50% has been split in half at the median of all of the data in the top 50%. This median is Q3. To the left of Q3 is 25% of the data. This is between M and Q3. On the other side of Q3 is another 25% of the data. This is from Q3 to the maximum point. Together these two 25% sections make up the top 50% of the data. To the left of M is the top 50% of the data, so in total, to the left of Q3 is 25% of the data and the bottom 50% of the data, for a total of 75% of the data.\" \/><\/span><\/span><\/li>\n<li>The middle 50% of the data falls between Q1 and Q3, and therefore:\n<p id=\"N10B3C\">IQR = Q3 \u2013 Q1<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_4\" class=\"img-responsive popimg aligncenter\" title=\"A line representing the range of data. The leftmost point is the minimum point and the rightmost point is the maximum point. 25% of the line starting at the minimum point is the area to the left of Q1. To the right of Q1, going right another 25% of the line brings us to M. Going right another 25% brings us to Q3, and the last 25% brings us to the maximum point. The line segment between Q1 and Q3 is the middle 50% of the data, which is used for to calculate IQR = Q3-Q1\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread6.gif\" alt=\"A line representing the range of data. The leftmost point is the minimum point and the rightmost point is the maximum point. 25% of the line starting at the minimum point is the area to the left of Q1. To the right of Q1, going right another 25% of the line brings us to M. Going right another 25% brings us to Q3, and the last 25% brings us to the maximum point. The line segment between Q1 and Q3 is the middle 50% of the data, which is used for to calculate IQR = Q3-Q1\" \/><\/span><\/span><\/li>\n<\/ol>\n<\/div>\n<\/div>\n<div id=\"N10B49\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Comments<\/span><\/h2>\n<ol>\n<li>\n<p id=\"N10B54\">The last picture shows that Q1, M, and Q3 divide the data into four quarters with 25% of the data points in each, where the median is essentially the second quartile. The use of IQR = Q3 \u2013 Q1 as a measure of spread is therefore particularly appropriate when the median M is used as a measure of center.<\/p>\n<\/li>\n<li>\n<p id=\"N10B5A\">We can define a bit more precisely what is considered the bottom or top 50% of the data. The bottom (top) 50% of the data is all the observations whose position in the ordered list is to the left (right) of the location of the overall median M. The following picture will visually illustrate this for the simple cases of n = 7 and n = 8.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"_i_5\" class=\"img-responsive popimg aligncenter\" title=\"Two sets of dots. The first set of dots consists of 7 dots. These dots represent data points, and they are ordered so that the dots are in a line, from least to greatest. The 4th dot is the middle dot, so this is the median. The bottom 50% of the data are the 3 dots to the left of the 4th dot, and the top 50% of the data are the 3 dots to the right of the 4th dot. In the second set of dots, we have 8 dots, arranged from least to greatest. There is no middle dot, so the median M is the average of the 4th and 5th dots. The 4 dots from the 1st to 4th dot are the bottom 50% of the data, and the four dots from the 5th to 8th are the top 50% of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread7.gif\" alt=\"Two sets of dots. The first set of dots consists of 7 dots. These dots represent data points, and they are ordered so that the dots are in a line, from least to greatest. The 4th dot is the middle dot, so this is the median. The bottom 50% of the data are the 3 dots to the left of the 4th dot, and the top 50% of the data are the 3 dots to the right of the 4th dot. In the second set of dots, we have 8 dots, arranged from least to greatest. There is no middle dot, so the median M is the average of the 4th and 5th dots. The 4 dots from the 1st to 4th dot are the bottom 50% of the data, and the four dots from the 5th to 8th are the top 50% of the data.\" \/><\/span><\/span><\/p>\n<p id=\"N10B63\">Note that when n is odd (as in n = 7 above), the median is\u00a0<em>not<\/em>\u00a0included in either the bottom or top half of the data; When n is even (as in n = 8 above), the data are naturally divided into two halves.<\/p>\n<\/li>\n<\/ol>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"example clearfix\">\n<h4>Best Actress Oscar Winners<\/h4>\n<div>\n<p id=\"N10B04\">To find the IQR of the Best Actress Oscar winners distribution, it will be convenient to use the stemplot.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Stem plot of the Best Actress Oscar winners. The lower half of the step plot is the bottom half and the upper half is the top half. The stem plot is described in a stem|leaves format in row order. Note that the bottom half ends and the top half begins in the middle of a line (between two leaves). We begin with the bottom half: 2|12 2|56678999 3|012233333344 3|The top half: 3|4 3|5566789 4|1112244 4|99 5| 5| 6|112 6| 7|4 7| 8|0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot_IQR.jpg\" alt=\"Stem plot of the Best Actress Oscar winners. The lower half of the step plot is the bottom half and the upper half is the top half. The stem plot is described in a stem|leaves format in row order. Note that the bottom half ends and the top half begins in the middle of a line (between two leaves). We begin with the bottom half: 2|12 2|56678999 3|012233333344 3|The top half: 3|4 3|5566789 4|1112244 4|99 5| 5| 6|112 6| 7|4 7| 8|0\" \/><\/span><\/span><\/p>\n<p>Q1 is the median of the bottom half of the data. Since there are 22 observations in that half, Q1 is the mean of the 11th and 12th ranked observations in that half:<\/p>\n<p>[latex]Q1=\\frac{(30+31)}{2}=30.5[\/latex]<\/p>\n<p id=\"N10B53\">Similarly, Q3 is the median of the top half of the data, and since there are 22 observations in that half, Q3 is the mean of the 11th and 12th ranked observations in that half:<\/p>\n<p>[latex]Q3=\\frac{(42+42)}{2}=42[\/latex]<\/p>\n<p id=\"N10B99\"><span id=\"MathJax-Element-3-Frame\" class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-45\" class=\"mjx-math\"><span id=\"MJXc-Node-46\" class=\"mjx-mrow\"><span id=\"MJXc-Node-47\" class=\"mjx-mrow\"><span id=\"MJXc-Node-48\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span id=\"MJXc-Node-49\" class=\"mjx-mtext MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-50\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-51\" class=\"mjx-mtext MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-52\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-53\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span id=\"MJXc-Node-54\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-55\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span id=\"MJXc-Node-56\" class=\"mjx-mn MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span id=\"MJXc-Node-57\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-58\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><span id=\"MJXc-Node-59\" class=\"mjx-mtext\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-60\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-61\" class=\"mjx-mtext MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u00a0<\/span><\/span><span id=\"MJXc-Node-62\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span id=\"MJXc-Node-63\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-64\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><\/span><\/span><\/span><\/p>\n<p id=\"N10BD6\">Note that in this example, the range covered by all the ages is 59 years, while the range covered by the middle 50% of the ages is only 11.5 years. While the whole dataset is spread over a range of 59 years, the middle 50% of the data is packed into only 11.5 years. Looking again at the histogram will illustrate this:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Histogram of the Best Actress Oscar Winners with the Range and IQR labeled. Recall that the histogram is skewed right. While the range encompasses the entire histogram, the IQR starts at x=30.5 and ends at x=42 , which is located within area of ages with higher frequencies on the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_histogram_IQR.jpg\" alt=\"Histogram of the Best Actress Oscar Winners with the Range and IQR labeled. Recall that the histogram is skewed right. While the range encompasses the entire histogram, the IQR starts at x=30.5 and ends at x=42 , which is located within area of ages with higher frequencies on the histogram.\" width=\"750\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2>Comment<\/h2>\n<\/div>\n<\/div>\n<div id=\"N10BE1\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10BE8\">Software packages use different formulas to calculate the quartiles Q1 and Q3. This should not worry you, as long as you understand the idea behind these concepts. For example, here are the quartile values provided by three different software packages for the age of best actress Oscar winners:<\/p>\n<p id=\"N10BEB\"><em>R:<\/em><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A snippet of output from R. It shows that: Min=21.00, Q1=32.50, Median=35, Q3=41.25, Max=80.00 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9r.gif\" alt=\"A snippet of output from R. It shows that: Min=21.00, Q1=32.50, Median=35, Q3=41.25, Max=80.00 .\" \/><\/span><\/span><\/p>\n<p id=\"N10BF5\"><em>Minitab:<\/em><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A snippet of output from Minitab. It shows that N=32, Mean=38.53, Median=35.00, TrMean=36.89, StDev=12.95, SE Mean=2.29, Minimum=21.00, Maximum=80.00, Q1=31.50, Q2=41.75 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9.gif\" alt=\"A snippet of output from Minitab. It shows that N=32, Mean=38.53, Median=35.00, TrMean=36.89, StDev=12.95, SE Mean=2.29, Minimum=21.00, Maximum=80.00, Q1=31.50, Q2=41.75 .\" \/><\/span><\/span><\/p>\n<p id=\"N10BFF\"><em>Excel:<\/em><\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"Four cells from a Excel spreadsheet showing that Q1=32.5 and Q3=41.25 .\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread9excel.gif\" alt=\"Four cells from a Excel spreadsheet showing that Q1=32.5 and Q3=41.25 .\" \/><\/span><\/span><\/p>\n<p id=\"N10C09\"><em>Note<\/em>\u00a0that Q1 and Q3 as reported by the various software packages differ from each other and are also slightly different from the ones we found here. There are different acceptable ways to find the median and the quartiles. These can give different results occasionally, especially for datasets where n (the number of observations) is fairly small. As long as you know what the numbers mean, and how to interpret them in context, it doesn\u2019t really matter much what method you use to find them, since the differences are really negligible.<\/p>\n<div class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Using the IQR to Detect Outliers<\/span><\/h2>\n<p id=\"N10B21\">So far we have quantified the idea of center, and we are in the middle of the discussion about measuring spread, but we haven\u2019t really talked about a method or rule that will help us classify extreme observations as outliers. The IQR is used as the basis for a rule of thumb for identifying outliers.<\/p>\n<\/div>\n<\/div>\n<div id=\"N10B26\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The 1.5(IQR) Criterion for Outliers<\/span><\/h2>\n<p>An observation is considered a suspected outlier if it is:<\/p>\n<ul>\n<li>below Q1 \u2013 1.5(IQR) or<\/li>\n<li>above Q3 + 1.5(IQR)<\/li>\n<\/ul>\n<p id=\"N10B39\">The following picture illustrates this rule:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A line representing all of the data. The data is ordered so that the minimum point is the leftmost on the line and the maximum point is the rightmost. At the center of the line is M, the median, and to the left of M is Q1. Even farther to the left of Q1 is Q1-1.5(IQR). Points farther left than this are suspected outliers. To the right of M is Q3, and farther to the right is Q3+1.5(IQR). Points even farther than this are also suspected outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread10.gif\" alt=\"A line representing all of the data. The data is ordered so that the minimum point is the leftmost on the line and the maximum point is the rightmost. At the center of the line is M, the median, and to the left of M is Q1. Even farther to the left of Q1 is Q1-1.5(IQR). Points farther left than this are suspected outliers. To the right of M is Q3, and farther to the right is Q3+1.5(IQR). Points even farther than this are also suspected outliers.\" \/><\/span><\/span><\/p>\n<div class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"N10B47\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.<\/p>\n<table class=\"formula\">\n<tbody>\n<tr>\n<td>34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Recall that when we first looked at the histogram of ages of Best Actress Oscar winners, there were 5 observations that looked like possible outliers:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A histogram of the Oscar winners in which for x=62 the frequency is 3 and for x=74 and x=80, the frequency is 1. Those points are thought to be possible outliers.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_histogram_outliers.jpg\" alt=\"A histogram of the Oscar winners in which for x=62 the frequency is 3 and for x=74 and x=80, the frequency is 1. Those points are thought to be possible outliers.\" width=\"800\" \/><\/span><\/span><\/p>\n<p id=\"N10B64\">We can now use the 1.5(IQR) criterion to check whether the 5 observations should indeed be classified as outliers:<\/p>\n<ul>\n<li>For this example we found that\u00a0<span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span class=\"mjx-mi MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">and<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">3<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">\u21d2<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><\/span><\/span><\/li>\n<li><span class=\"mjx-chtml MathJax_CHTML\"><span class=\"mjx-math\"><span class=\"mjx-mrow\"><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">30.5<\/span><\/span><span class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">\u2212<\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mfenced\"><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-66\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">13<\/span><\/span><span id=\"MJXc-Node-67\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-68\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">25<\/span><\/span><\/span><\/span><\/span><\/li>\n<li><span class=\"mjx-chtml MathJax_CHTML\"><span id=\"MJXc-Node-69\" class=\"mjx-math\"><span id=\"MJXc-Node-70\" class=\"mjx-mrow\"><span id=\"MJXc-Node-71\" class=\"mjx-mrow\"><span id=\"MJXc-Node-72\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-math-I\">Q<\/span><\/span><span id=\"MJXc-Node-73\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">3<\/span><\/span><span id=\"MJXc-Node-75\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">+<\/span><\/span><span id=\"MJXc-Node-77\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span id=\"MJXc-Node-78\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-79\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span id=\"MJXc-Node-80\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-81\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-82\" class=\"mjx-mi\"><span class=\"mjx-char MJXc-TeX-main-R\">IQR<\/span><\/span><span id=\"MJXc-Node-83\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-85\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-87\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">42<\/span><\/span><span id=\"MJXc-Node-88\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-89\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><span id=\"MJXc-Node-91\" class=\"mjx-mo MJXc-space2\"><span class=\"mjx-char MJXc-TeX-main-R\">+<\/span><\/span><span id=\"MJXc-Node-93\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-94\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-95\" class=\"mjx-mrow\"><span id=\"MJXc-Node-96\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">1<\/span><\/span><span id=\"MJXc-Node-97\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-98\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><span id=\"MJXc-Node-99\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-100\" class=\"mjx-mfenced\"><span id=\"MJXc-Node-101\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">(<\/span><\/span><span id=\"MJXc-Node-102\" class=\"mjx-mrow\"><span id=\"MJXc-Node-103\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">11<\/span><\/span><span id=\"MJXc-Node-104\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-105\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">5<\/span><\/span><\/span><span id=\"MJXc-Node-106\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">)<\/span><\/span><\/span><span id=\"MJXc-Node-108\" class=\"mjx-mo MJXc-space3\"><span class=\"mjx-char MJXc-TeX-main-R\">=<\/span><\/span><span id=\"MJXc-Node-110\" class=\"mjx-mn\"><span class=\"mjx-char MJXc-TeX-main-R\">59<\/span><\/span><span id=\"MJXc-Node-111\" class=\"mjx-mo\"><span class=\"mjx-char MJXc-TeX-main-R\">.<\/span><\/span><span id=\"MJXc-Node-112\" class=\"mjx-mn MJXc-space1\"><span class=\"mjx-char MJXc-TeX-main-R\">25<\/span><\/span><\/span><\/span><\/span><\/span><\/li>\n<\/ul>\n<p id=\"N10CA5\">The 1.5(IQR) criterion tells us that any observation that is below 13.25 or above 59.25 is considered a suspected outlier.<\/p>\n<p id=\"N10CA8\">We therefore conclude that the observations 61, 61, 62, 74 and 80 should be flagged as suspected outliers in the distribution of ages. Note that since the smallest observation is 21, there are no suspected low outliers in this distribution.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-25\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-25\" class=\"h5p-iframe\" data-content-id=\"25\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Did I get this? 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"N10B03\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Understanding Outliers<\/span><\/h2>\n<p id=\"N10B0A\">We just practiced one way to \u2018flag\u2019 possible outliers. Why is it important to identify possible outliers, and how should they be dealt with? The answers to these questions depend on the reasons for the outlying values. Here are several possibilities:<\/p>\n<ol>\n<li>Even though it is an extreme value, if an outlier can be understood to have been produced by\u00a0<em>essentially the same sort of physical or biological process<\/em>\u00a0as the rest of the data, and if such extreme values are expected to\u00a0<em>eventually occur again<\/em>, then such an outlier indicates something important and interesting about the process you\u2019re investigating, and it\u00a0<em>should be kept<\/em>\u00a0in the data.<\/li>\n<li>\n<p id=\"N10B1E\">If an outlier can be explained to have been produced under fundamentally\u00a0<em>different<\/em>\u00a0conditions from the rest of the data (or by a fundamentally different process), such an outlier\u00a0<em>can be removed<\/em>\u00a0from the data if your goal is to investigate only the process that produced the rest of the data.<\/p>\n<\/li>\n<li>\n<p id=\"N10B29\">An outlier might indicate a\u00a0<em>mistake<\/em>\u00a0in the data (like a typo, or a measuring error), in which case it\u00a0<em>should be corrected if possible or else removed<\/em>\u00a0from the data before calculating summary statistics or making inferences from the data (and the reason for the mistake should be investigated).<\/p>\n<\/li>\n<\/ol>\n<p id=\"N10B33\"><em>Here are examples of each of these types of outliers:<\/em><\/p>\n<ol>\n<li>\n<p id=\"N10B3B\">The following histogram displays the magnitude of 460 earthquakes in California, occurring in the year 2000, between August 28 and September 9:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;California Earthquakes, Aug 28,2000 - Sep 9,2000&quot;. The histogram is skewed-right. Frequency on the Y-axis ranges from 0 to 90, and on the X-axis is Magnitude in Richter units, from 0 to 5.4 . As we go from left to right across the X-axis, the frequency increases to the mode at x=1.2, y=90, then it decreases to 0 after x=3.6. However, beyond 4.8, we see a small bar representing a frequency of 1.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread21.gif\" alt=\"A histogram titled &quot;California Earthquakes, Aug 28,2000 - Sep 9,2000&quot;. The histogram is skewed-right. Frequency on the Y-axis ranges from 0 to 90, and on the X-axis is Magnitude in Richter units, from 0 to 5.4 . As we go from left to right across the X-axis, the frequency increases to the mode at x=1.2, y=90, then it decreases to 0 after x=3.6. However, beyond 4.8, we see a small bar representing a frequency of 1.\" \/><\/span><\/span><\/p>\n<p id=\"N10B44\"><em>Identifying the outlier:<\/em><\/p>\n<p id=\"N10B48\">On the very far right edge of the display (beyond 4.8), we see a low bar; this represents one earthquake (because the bar has height of 1) that was much more severe than the others in the data.<\/p>\n<p id=\"N10B4B\"><em>Understanding the outlier:<\/em><\/p>\n<p id=\"N10B4F\">In this case, the outlier represents a much stronger earthquake, which is relatively rarer than the smaller quakes that happen more frequently in California.<\/p>\n<p id=\"N10B52\"><em>How to handle the outlier:<\/em><\/p>\n<p>For many purposes, the relatively severe quakes represented by the outlier might be the most important (because, for instance, that sort of quake has the potential to do more damage to people and infrastructure). The smaller-magnitude quakes might not do any damage, or even be felt at all. So, for many purposes it could be important to keep this outlier in the data.<\/li>\n<li>The following histogram displays the monthly percent return on the stock of Phillip Morris (a large tobacco company) from July 1990 to May 1997:<span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;Phillip Morris Monthly Stock Return, July 1990 - May 1997. On the Y-axis is Frequency, from 0 to 30. On the X-axis is Monthy Stock Return in percent. It ranges from -30 to 20. The histogram is skewed-left. At the very left, between at the interval x=(-30, -25), a bar indicating frequency of 1 appears. Then, we see no bar until x=-15, where there is a bar of frequency 5. As we continue moving right along the x-axis, frequency increases to the mode of 30 at the interval x=(0,5), and then decreases, until reaching a frequency of 5 at the interval x=(15,20).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread22.gif\" alt=\"A histogram titled &quot;Phillip Morris Monthly Stock Return, July 1990 - May 1997. On the Y-axis is Frequency, from 0 to 30. On the X-axis is Monthy Stock Return in percent. It ranges from -30 to 20. The histogram is skewed-left. At the very left, between at the interval x=(-30, -25), a bar indicating frequency of 1 appears. Then, we see no bar until x=-15, where there is a bar of frequency 5. As we continue moving right along the x-axis, frequency increases to the mode of 30 at the interval x=(0,5), and then decreases, until reaching a frequency of 5 at the interval x=(15,20).\" \/><\/span><\/span>\n<p id=\"N10B67\"><em>Identifying the outlier:<\/em><\/p>\n<p id=\"N10B6C\">On the display, we see a low bar far to the left of the others; this represents one month\u2019s return (because the bar has height of 1), where the value of Phillip Morris stock was unusually low.<\/p>\n<p><em>Understanding the outlier:<\/em><\/p>\n<p id=\"N10B73\">The explanation for this particular outlier is that, in the early 1990s, there were highly-publicized federal hearings being conducted regarding the addictiveness of smoking, and there was growing public sentiment against the tobacco companies. The unusually low monthly value in the Phillip Morris dataset was due to public pressure against smoking, which negatively affected the company\u2019s stock for that particular month.<\/p>\n<p id=\"N10B76\"><em>How to handle the outlier:<\/em><\/p>\n<p id=\"N10B7A\">In this case, the outlier was due to unusual conditions during one particular month that aren\u2019t expected to be repeated, and that were fundamentally different from the conditions that produced the values in all the other months. So in this case, it would be reasonable to remove the outlier, if we wanted to characterize the \u2018typical\u2019 monthly return on Phillip Morris stock.<\/p>\n<\/li>\n<li>\n<p id=\"N10B7F\">When archaeologists dig up objects such as pieces of ancient pottery, chemical analysis can be performed on the artifacts. The chemical content of pottery can vary depending on the type of clay as well as the particular manufacturing technique. The following histogram displays the results of one such actual chemical analysis, performed on 48 ancient Roman pottery artifacts from archaeological sites in Britain:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot;Manganous Oxide Content in a sample of Ancient Roman Pottery&quot;. The X-axis is labeled &quot;number of pottery shards&quot;, and ranges from 0 to 20. The Y-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.0 to 0.4 . The histogram is skewed-right. Here are the bars: x=0.0,y=10; x=0.05,y=13; x=0.1,y=18; x=0.15,y=5; x=0.20,y=1; x=0.4,y=1. Note that there are no shards for x=0.25 to x=0.35\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread23.gif\" alt=\"A histogram titled &quot;Manganous Oxide Content in a sample of Ancient Roman Pottery&quot;. The X-axis is labeled &quot;number of pottery shards&quot;, and ranges from 0 to 20. The Y-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.0 to 0.4 . The histogram is skewed-right. Here are the bars: x=0.0,y=10; x=0.05,y=13; x=0.1,y=18; x=0.15,y=5; x=0.20,y=1; x=0.4,y=1. Note that there are no shards for x=0.25 to x=0.35\" \/><\/span><\/span><\/p>\n<p id=\"N10B94\"><em>Identifying the outlier:<\/em><\/p>\n<p id=\"N10B98\">On the display, we see a low bar far to the right of the others; this represents one piece of pottery (because the bar has a height of 1), which has a suspiciously high manganous oxide value.<\/p>\n<p id=\"N10B9B\"><em>Understanding the outlier:<\/em><\/p>\n<p id=\"N10BA0\">Based on comparison with other pieces of pottery found at the same site, and based on expert understanding of the typical content of this particular compound, it was concluded that the unusually high value was most likely a typo that was made when the data were published in the original 1980 paper (it was typed as \u201c.394\u201d but it was probably meant to be \u201c.094\u201d).<\/p>\n<p id=\"N10BA3\"><em>How to handle the outlier:<\/em><\/p>\n<p id=\"N10BA7\">In this case, since the outlier was judged to be a mistake, it should be removed from the data before further analysis. In fact, removing the outlier is useful not only because it\u2019s a mistake, but also because doing so reveals important structure that was otherwise hidden. This feature is evident on the next display:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"A histogram titled &quot; Histogram without the outlier&quot; The Y-axis is labeled &quot;number of pottery shards&quot;, and it ranges from 0 to 12. The X-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.00 to about 0.18. Going from left to right along the X-axis reveals that at x=0, there is a frequency of 10. Then, there are no bars until x=0.4 . From here the bars increase in height until x=0.08, where the frequency is 12. Then the bars begin to decrease.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/spread24.gif\" alt=\"A histogram titled &quot; Histogram without the outlier&quot; The Y-axis is labeled &quot;number of pottery shards&quot;, and it ranges from 0 to 12. The X-axis is labeled &quot;manganous oxide [MnO] content&quot; and ranges from 0.00 to about 0.18. Going from left to right along the X-axis reveals that at x=0, there is a frequency of 10. Then, there are no bars until x=0.4 . From here the bars increase in height until x=0.08, where the frequency is 12. Then the bars begin to decrease.\" \/><\/span><\/span><\/p>\n<p id=\"N10BB0\">When the outlier is removed, the display is re-scaled so that now we can see the set of 10 pottery pieces that had almost no manganous oxide. These 10 pieces might have been made with a different potting technique, so identifying them as different from the rest is historically useful. This feature was only evident after the outlier was removed.<\/p>\n<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<div id=\"N10BB7\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<ul>\n<li>\n<p id=\"N10BC1\">The range covered by the data is the most intuitive measure of spread and is exactly the distance between the smallest data point (min) and the largest one (Max).<\/p>\n<\/li>\n<li>\n<p id=\"N10BC5\">Another measure of spread is the inter-quartile range (IQR), which is the range covered by the middle 50% of the data.<\/p>\n<\/li>\n<li>\n<p id=\"N10BC9\">IQR = Q3 \u2013 Q1, the difference between the third and first quartiles. The first quartile (Q1) is the value such that one quarter (25%) of the data points fall below it, or the median of the bottom half of the data. The third quartile is the value such that three quarters (75%) of the data points fall below it, or the median of the top half of the data.<\/p>\n<\/li>\n<li>\n<p id=\"N10BCD\">The IQR should be used as a measure of spread of a distribution only when the median is used as a measure of center.<\/p>\n<\/li>\n<li>\n<p id=\"N10BD1\">The IQR can be used to detect outliers using the 1.5(IQR) criterion. Outliers are observations that fall below Q1 \u2013 1.5(IQR) or above Q3 + 1.5(IQR).<\/p>\n<\/li>\n<\/ul>\n<hr \/>\n<div class=\"\">\n<h2>Introduction<\/h2>\n<\/div>\n<div id=\"f37a2e56ba69443aafc171c71d68109d\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"a833565489bd452ea43d004ce5199037\">Before we move on to the third measure of spread (standard deviation), we\u2019ll summarize what we\u2019ve learned so far about measuring spread and use it to introduce another graphical display of the distribution of a quantitative variable, the\u00a0<em class=\"italic\">boxplot<\/em>.<\/p>\n<\/div>\n<\/div>\n<div id=\"c061abd640bd43669d21d9221d16e960\" class=\"section purposewrap\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Five Number Summary<\/span><\/h2>\n<p id=\"c67c291de21942e9bcad953a08e0eefd\">So far, in our discussion about measures of spread, the key players were:<\/p>\n<ul id=\"df7873c703854e76a808c66ac50420be\">\n<li>\n<p id=\"aa1c4b4349a804d1fb1bf166dc3cba834\">the extremes (min and Max), which provide the range covered by all the data; and<\/p>\n<\/li>\n<li>\n<p id=\"aec5f5eb73ef7408684e3a28c0c08d6a5\">the quartiles (Q1, M and Q3), which together provide the IQR, the range covered by the middle 50% of the data.<\/p>\n<\/li>\n<\/ul>\n<p id=\"c9a84bdd621e44999145a969a4e95f44\">The combination of all five numbers (min, Q1, M, Q3, Max) is called the\u00a0<em class=\"italic\">five number summary<\/em>, and provides a quick numerical description of both the center and spread of a distribution.<\/p>\n<div id=\"c341b139c7ed4e3fa1483f04434971b1\" class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"example clearfix\">\n<h4>Best Actress Oscar Winners<\/h4>\n<div>\n<p id=\"f6bb78fcde9942e1bd775b3eff365469\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a><\/p>\n<p id=\"f8547fe8afc246349132d5ea4bc63e96\">34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44 The five number summary of the age of Best Actress Oscar winners (1970-2013) is:<\/p>\n<p id=\"af62c61f2b4c45c5a546e2904fe21aad\">Min: 21<\/p>\n<p id=\"d24b34d6e772487e8ec564858d7b1af4\">Q1: 30.5<\/p>\n<p id=\"ba2540d172cf402aabe3b3136bfd490d\">M: 34.5<\/p>\n<p id=\"f66a04548b9b4dfdbc01d83fa6d5e2db\">Q3: 42<\/p>\n<p id=\"c27406bdd1ad42c8a0c48e99e99922d3\">Max: 80<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"de1ee0a38af5437eafc3d8dfa466c5f6\">Now that you understand what each of the five numbers means, you can appreciate how much information about the distribution is packed into the five-number summary. All this information can also be represented visually by using the boxplot.<\/p>\n<div id=\"e17411323cd445e39c706e122a9d1c7b\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">The Boxplot<\/span><\/h2>\n<p id=\"f35182ed8ab24147a2f45500437d9040\">The boxplot graphically represents the distribution of a quantitative variable by visually displaying the five-number summary and any observation that was classified as a suspected outlier using the 1.5(IQR) criterion.<\/p>\n<p id=\"dbfae605efad4fe8b00068c4e26c7b2a\">There are several ways to plot the whiskers on a boxplot. One convention is to plot whiskers down to the minimum and up to the maximum value. We use the 1.5(IQR criterion), also known as the Tukey method for plotting whiskers. First, calculate the IQR, the difference between the 75th and 25th percentiles (or Q3 \u2013 Q1). Multiply the IQR by 1.5. Add this value to the 75th percentile. If the value is greater than (or equal to) the maximum value in the dataset, draw the upper whisker to the maximum value. Otherwise, stop the whisker at the largest value that is less than 75th percentile + 1.5 * IQR. Plot any values that are greater than this as individual points that are outliers. Similarly, subtract 1.5 * IQR from the 25th percentile. If this value is smaller than the minimum value in the dataset, draw the lower whisker to the minimum value. Otherwise, stop the whisker at the lowest value that is greater than 25th percentile \u2013 1.5 * IQR. Plot any values that are smaller than this as individual points that are outliers.<\/p>\n<p id=\"b12574f52c714dab81f422e9f0c29ad4\">Using the Best Actress dataset, here is how we determine where to draw the whiskers:<\/p>\n<ul id=\"e5ab7047a2a94d1b90b2381a5c085855\">\n<li>\n<p id=\"aa222eb04661a447ea56993e9a16b4b9b\">Q3 = 42<\/p>\n<\/li>\n<li>\n<p id=\"ab637fbdd00ce424ca123bfd19442cbc8\">Q1 = 30.5<\/p>\n<\/li>\n<li>\n<p id=\"ae0910e473f8c437891179e493a655a85\">IQR: 42 \u2013 30.5 = 11.5<\/p>\n<\/li>\n<li>\n<p id=\"ae2517c9def39461c81df9c1f464db5d8\">1.5 * IQR = 1.5 * 11.5 = 17.25<\/p>\n<\/li>\n<li>\n<p id=\"af9f5c7d6238647fdaf29e4546bfa380f\">Q3 + 1.5 * IQR = 42 + 17.25 = 59.25<\/p>\n<\/li>\n<\/ul>\n<p id=\"d0bfde4564da45e794d6e92fba03f3a6\">The largest observation that is less than or equal to 59.25 is 49 so we draw the upper whisker up to 49. All points above 49 are considered outliers (61, 61, 62, 74, 80).<\/p>\n<p id=\"abf6ac0bf9ec4cc68a488892e9754f43\">Q1 \u2013 1.5 * IQR = 30.5 \u2013 17.25 = 13.25<\/p>\n<p id=\"a55409c343f54a17a0f1c3b3d27792f5\">The smallest observation that is greater than or equal to 13.5 is 21 so we draw the lower whisker down to 21, which is also the minimum. There are no outliers.<\/p>\n<p id=\"bedca7b7005745f2a592b31b7e390d49\">Here is how a boxplot is constructed: (this is for the \u201cBest Actress\u201d dataset\u2014 to see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013<\/a>)<\/p>\n<p>&nbsp;<\/p>\n<p><iframe loading=\"lazy\" id=\"oembed-1\" title=\"Constructing a Boxplot\" width=\"500\" height=\"281\" src=\"https:\/\/www.youtube.com\/embed\/S50-WYpOm4I?feature=oembed&#38;rel=0&#38;rel=0\" frameborder=\"0\" allowfullscreen=\"allowfullscreen\"><\/iframe><\/p>\n<div class=\"figurewrap\">\n<div class=\"figure clearfix\">\n<div id=\"uwrap_f749d16feff4406789e1cca5c06754d5\" class=\"youtube\">\n<p><span style=\"text-align: initial; font-size: 1em;\">GeoGebra Group offers a simulation activity where you can practice calculating the median, Q1, Q3, IQR, and outliers and drawing a boxplot. Note that you can edit the data in the chart to see different results.<\/span><\/p>\n<\/div>\n<div class=\"captionwrap\">\n<p id=\"de88a844a7ba4781960528e6323bd2ad\">To view this interactive simulation in a separate window click\u00a0<a href=\"http:\/\/ggbtu.be\/m11008\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p><a href=\"https:\/\/www.geogebra.org\/m\/KhKTscBY\" target=\"_blank\" rel=\"noopener\">https:\/\/www.geogebra.org\/m\/KhKTscBY<\/a><\/p>\n<p id=\"e14345eab62040a98e124bac370b72c4\"><a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\" target=\"_blank\" rel=\"noopener\">CC BY-SA 3.0<\/a>\u00a0by\u00a0<a href=\"http:\/\/www.geogebra.org\/\" target=\"_blank\" rel=\"noopener\">GeoGebra Group<\/a><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div>\n<div class=\"textbox__content\">\n<p><img loading=\"lazy\" decoding=\"async\" id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the top box\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_q3.gif\" alt=\"boxplot graph with question mark next to the top box\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-26\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-26\" class=\"h5p-iframe\" data-content-id=\"26\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the top line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_none.gif\" alt=\"boxplot graph with question mark next to the top line\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-27\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-27\" class=\"h5p-iframe\" data-content-id=\"27\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to an star above the top line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_max.gif\" alt=\"boxplot graph with question mark next to an star above the top line\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-28\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-28\" class=\"h5p-iframe\" data-content-id=\"28\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 3\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the bottom line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_min.gif\" alt=\"boxplot graph with question mark next to the bottom line\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-29\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-29\" class=\"h5p-iframe\" data-content-id=\"29\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 4\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_q1.gif\" alt=\"boxplot graph with question mark next to the bottom box\" width=\"500\" height=\"374\" \/><\/p>\n<div id=\"h5p-30\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-30\" class=\"h5p-iframe\" data-content-id=\"30\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 5\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to the middle line\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_m.gif\" alt=\"boxplot graph with question mark next to the middle line\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-31\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-31\" class=\"h5p-iframe\" data-content-id=\"31\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 6\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to red arrow going from bottom line to top star\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_range.gif\" alt=\"boxplot graph with question mark next to red arrow going from bottom line to top star\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-32\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-32\" class=\"h5p-iframe\" data-content-id=\"32\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 7\"><\/iframe><\/div>\n<\/div>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph with question mark next to arrow marking the height of both boxes\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/webcontent\/flash\/boxplot2q1_iqr.gif\" alt=\"boxplot graph with question mark next to arrow marking the height of both boxes\" width=\"450\" height=\"450\" \/><\/p>\n<div id=\"h5p-33\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-33\" class=\"h5p-iframe\" data-content-id=\"33\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Learn by doing 8\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div><\/div>\n<\/div>\n<\/div>\n<p id=\"b5ec5b17efd54f1bb3e6affc3b0b7d5f\">GeoGebra Group offers a simulation activity where you can practice calculating the median, Q1, Q3, IQR, and outliers and drawing a boxplot. Note that you can edit the data in the chart to see different results.<\/p>\n<p>To view this interactive simulation in a separate window click\u00a0<a href=\"http:\/\/ggbtu.be\/m11008\" target=\"_blank\" rel=\"noopener\">here<\/a>.<\/p>\n<p><a href=\"https:\/\/creativecommons.org\/licenses\/by-sa\/3.0\/\" target=\"_blank\" rel=\"noopener\">CC BY-SA 3.0<\/a>\u00a0by\u00a0<a href=\"http:\/\/www.geogebra.org\/\" target=\"_blank\" rel=\"noopener\">GeoGebra Group<\/a><\/p>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N1007C\">The boxplot below displays ratings for TV shows during sweeps week:<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"boxplot graph for tv show ratings\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/boxplot10.gif\" alt=\"boxplot graph for tv show ratings\" \/><\/div>\n<div>\n<div id=\"h5p-34\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-34\" class=\"h5p-iframe\" data-content-id=\"34\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.3 Did I get this? 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"cf7fe0ff5d2540f9927123f2863ca048\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span title=\"Quick scroll up\">Side-By-Side (Comparative) Boxplots<\/span><\/h2>\n<p id=\"a4f537f74bff461aaf1dab3ae8a74c6c\">As we learned in the beginning of this module, the distribution of a quantitative variable is best represented graphically by a histogram. Boxplots are most useful when presented side-by-side for comparing and contrasting distributions from two or more groups.<\/p>\n<\/div>\n<\/div>\n<div id=\"ca9b36bab98e4036954a591969315be9\" class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Best Actor\/Actress Oscar Winners<\/h4>\n<div class=\"example clearfix\">\n<p id=\"a7dbed46f0de4d0a94e5b6ea9dcfd556\">So far we have examined the age distributions of Oscar winners for males and females separately.<\/p>\n<p id=\"fe9af4e22afa4ad2b0a5b7d9b74a2c35\">It will be interesting to\u00a0<em class=\"italic\">compare<\/em>\u00a0the age distributions of actors and actresses who won best acting Oscars. To do that we will look at side-by-side boxplots of the age distributions by gender. Recall also that we found the five-number summary and means for both distributions. For the Best Actress dataset, we did the calculations by hand. For the Best Actor dataset, we used statistical software, and here are the results we got:<\/p>\n<div class=\"Excel2019PC altContentOn\">\n<div class=\"alternative\">\n<ul id=\"b9b69db81c134c27bec90742ee3860cd\">\n<li>\n<p id=\"de6e4a27a2144c5a87cab2e99eabe8ea\">Actors: min = 31, Q1 = 38, M = 43.5, Q3 = 50.5, Max = 76<\/p>\n<\/li>\n<li>\n<p id=\"f264b0f3cfdb4c2aaacdb1326dd3369b\">Actresses: min = 21, Q1 = 30.5, M = 34.5, Q3 = 42, Max = 80<\/p>\n<\/li>\n<\/ul>\n<p id=\"bff1026cf49241c4b47636b7ce64753e\">Based on the graph and numerical measures, we can make the following comparison between the two distributions:<\/p>\n<p id=\"b7d00b85cece48a6b97bf3415303556b\"><em class=\"bold\">Center:<\/em>\u00a0The graph reveals that the age distribution of the males is higher than the females\u2019 age distribution. This is supported by the numerical measures. The median age for females (34.5) is lower than for the males (43.5). Actually, it should be noted that even the third quartile of the females\u2019 distribution (42) is lower than the median age for males. We therefore conclude that in general, actresses win the Best Actress Oscar at a younger age than actors do.<\/p>\n<p id=\"eb69d02be1514d1da883f699fa140014\"><em class=\"bold\">Spread:<\/em>\u00a0Judging by the range of the data, there is much more variability in the females\u2019 distribution (range = 59) than there is in the males\u2019 distribution (range = 47). On the other hand, if we look at the IQR, which measures the variability only among the middle 50% of the distribution, we see slightly more spread in the ages of males (IQR = 12.5) than females (IQR = 11.5). We conclude that among all the winners, the actors\u2019 ages are more alike than the actresses\u2019 ages. However, the middle 50% of the age distribution of actresses is more homogeneous than the actors\u2019 age distribution.<\/p>\n<p id=\"ea420af3ae6743bda47a0013d7f1b1d9\"><em class=\"bold\">Outliers:<\/em>\u00a0We see that we have outliers in both distributions. There is only one high outlier in the actors\u2019 distribution (76, Henry Fonda,\u00a0<em class=\"italic\">On Golden Pond<\/em>), compared with three high outliers in the actresses\u2019 distribution.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div id=\"bc6c2f8cb179418e8b7f2f18af190b45\" class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Temperature of Pittsburgh vs. San Francisco<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"addd9131226a4b9fa9b98f4a4071b080\">In order to compare the average high temperatures of Pittsburgh to those in San Francisco we will look at the following side-by-side boxplots, and supplement the graph with the descriptive statistics of each of the two distributions.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b4efb570be4b4c21a236493cd8951c27\" class=\"img-responsive popimg\" title=\"A box plot titled &amp;quot;Average High Temperature: San Francisco vs. Pittsburgh&amp;quot;. The vertical axis is in units of Temperature (F), and it goes from 30-80. There are two box plots, one for Pittsburgh and one for San Francisco.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/boxplot8.gif\" alt=\"A box plot titled &amp;quot;Average High Temperature: San Francisco vs. Pittsburgh&amp;quot;. The vertical axis is in units of Temperature (F), and it goes from 30-80. There are two box plots, one for Pittsburgh and one for San Francisco.\" \/><\/span><\/span><\/p>\n<table id=\"fa637b2947c143bf9b6ec8a4116bb0bd_bx\" class=\"table labeled\">\n<tfoot>\n<tr>\n<td class=\"captionwrap\"><\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>\n<table id=\"fa637b2947c143bf9b6ec8a4116bb0bd\" class=\"wbtable plain\">\n<thead>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"abe5c26b236a64c92ab6fda92bd392400\">Statistic<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ae3c42c26cbf84afab9ee7164b4a9bf44\">San Francisco<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa49bd1d25158497887686a8b78fe1793\">Pittsburgh<\/p>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aee6dfb08935c4374b37763cf4ef5bfea\">min<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"af2972a1e3f6042af959296958f93bd99\">56.3<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aaa8a50a5ec784f448a50e70ede0a5dbf\">33.7<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"abb71db9f5fe54afa966be95b12436084\">Q1<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab97c99caf6c6411b92e5b6980a507161\">60.2<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ad22ae158d2c24313bbea13e3a64929d2\">41.2<\/p>\n<\/td>\n<\/tr>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa590a14a3fee4d9b8920737af53ff36c\">Median<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"af1e01b5d49314ebc9b9f654322ec61ea\">62.7<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aaf85996c37f44c64b764a0b99402a3bd\">61.4<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ae2135487985f44bbbcaea05a6a62a4df\">Q3<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab903ec3c16a54101bb10f1808c3a147d\">65.35<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab57b364a0f78478aac015300202fc5cc\">77.75<\/p>\n<\/td>\n<\/tr>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ac211a0c29c4746af8d177e561236a1a7\">Max<\/p>\n<\/th>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aacc9893314164721ae6c3d924bcb411f\">68.7<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aedd79e0c13de44f1ad84aa65538216a1\">82.6<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"b8e22ce18a4f47bdae55d024611b2b3f\">When looking at the graph, the similarities and differences between the two distributions are striking. Both distributions have roughly the same center (medians are 61.4 for Pitt, and 62.7 for San Francisco). However, the temperatures in Pittsburgh have a much larger variability than the temperatures in San Francisco (Range: 49 vs. 12. IQR: 36.5 vs. 5).<\/p>\n<p id=\"aa0c792c85ae457cb8e0b3c621154eb8\">The practical interpretation of the results we got is that the weather in San Francisco is much more consistent than the weather in Pittsburgh, which varies a lot during the year. Also, because the temperatures in San Francisco vary so little during the year, knowing that the median temperature is around 63 is actually very informative. On the other hand, knowing that the median temperature in Pittsburgh is around 61 is practically useless, since temperatures vary so much during the year, and can get much warmer or much colder.<\/p>\n<p id=\"f44f0ee8953441d3a84a160a1ef8a69e\">Note that this example provides more intuition about variability by interpreting small variability as consistency, and large variability as lack of consistency. Also, through this example we learned that the center of the distribution is more meaningful as a typical value for the distribution when there is little variability (or, as statisticians say, little \u201cnoise\u201d) around it. When there is large variability, the center loses its practical meaning as a typical value.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<h2><span title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<ul id=\"df69ce4741264998a2c991342adb74e8\">\n<li>\n<p id=\"ae0f5c9c45ca54e92b22a07d77187ab90\">The five-number summary of a distribution consists of the median (M), the two quartiles (Q1, Q3) and the extremes (min, Max).<\/p>\n<\/li>\n<li>\n<p id=\"af35b33ca7cc949769a97448e32a71f95\">The five-number summary provides a complete numerical description of a distribution. The median describes the center, and the extremes (which give the range) and the quartiles (which give the IQR) describe the spread.<\/p>\n<\/li>\n<li>\n<p id=\"ab47fb74b3d4b47b8993a0ee72c397d04\">The boxplot graphically represents the distribution of a quantitative variable by visually displaying the five number summary and any observation that was classified as a suspected outlier using the 1.5 (IQR) criterion.<\/p>\n<\/li>\n<li>\n<p id=\"adb8edfab278c44c1859910f4e1bb2a19\">Boxplots are most useful when presented side-by-side to compare and contrast distributions from two or more groups.<\/p>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":10,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-459","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":413,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/459","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":18,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/459\/revisions"}],"predecessor-version":[{"id":972,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/459\/revisions\/972"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/413"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/459\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=459"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=459"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=459"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=459"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}