{"id":450,"date":"2024-10-18T01:32:31","date_gmt":"2024-10-18T01:32:31","guid":{"rendered":"https:\/\/pressbooks.ccconline.org\/mat1260\/?post_type=chapter&#038;p=450"},"modified":"2025-01-06T19:23:03","modified_gmt":"2025-01-06T19:23:03","slug":"2-1-organizing-and-visualizing-data","status":"publish","type":"chapter","link":"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/2-1-organizing-and-visualizing-data\/","title":{"raw":"2.1: Organizing and Visualizing Data","rendered":"2.1: Organizing and Visualizing Data"},"content":{"raw":"As indicated in the introduction, we will begin the Exploratory Data Analysis part of the course by exploring (or looking at) one variable at a time.\r\n\r\nAs we saw in <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/1-2-data-types-and-levels-of-measurement\/\">Data Types and Levels of Measurement<\/a>, the data for each variable are a long list of values (whether numerical or not), and are not very informative in that form. In order to convert these raw data into useful information we need to summarize and then examine the\u00a0<em>distribution<\/em>\u00a0of the variable. By\u00a0<em>distribution<\/em>\u00a0of a variable, we mean:\r\n<ul>\r\n \t<li>what values the variable takes, and<\/li>\r\n \t<li>how often the variable takes those values.<\/li>\r\n<\/ul>\r\nThis chapter has two sections. We will first learn how to summarize and examine the distribution of a single categorical variable, and then do the same for a quantitative variable.\r\n<h2><span style=\"color: #800080;\">Organizing One Categorical Variable<\/span><\/h2>\r\n<p id=\"ebc684a34ec14c99806984fa95c2fc4c\">What is your perception of your own body? Do you feel that you are overweight, underweight, or about right?<\/p>\r\n<p id=\"d237e9121f28484cba1c00fed8bd6157\">A random sample of 1,200 U.S. college students were asked this question as part of a larger survey. The following table shows part of the responses:<\/p>\r\n\r\n<table id=\"f015b12e7e6740a0a03a9830cd268dfe_bx\" class=\"table labeled\">\r\n<thead>\r\n<tr>\r\n<th>Body Image<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\"><\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table id=\"f015b12e7e6740a0a03a9830cd268dfe\" class=\"wbtable plain\">\r\n<thead>\r\n<tr>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"abfbeeaef537143978827d5090a8d3b95\">Student<\/p>\r\n<\/th>\r\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab91d2e3725044630b0329a323928a157\">Body Image<\/p>\r\n<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa9f51764e276490caa7b986f90e24791\">student 25<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ac043b9fcf1434a00b9a1e3c44f33865c\">overweight<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aeed717a4023f42368f2288c3e1732920\">student 26<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ae2450789be0d48599a11a8f6bdc3174e\">about right<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa90e21f0004b468cb4c98a0a712986b7\">student 27<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"acde29c624aa64dc396827e23c46030b7\">underweight<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa0393977ae1c42d58b64f3d41592c6e3\">student 28<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aea1008255fe9494394a26a94cdd8fbbb\">about right<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"afd9dada760714c588dd0072424d6b2b7\">student 29<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ad06450ac245a44df8950ad069d01e675\">about right<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"f6eed5f74abb4c27a93ea601d88fc7ad\">Here is some information that would be interesting to get from these data:<\/p>\r\n\r\n<ul id=\"b6a4cebd0b2e418991e418bea82f4fe7\">\r\n \t<li>\r\n<p id=\"ad39c42534ad643df86cd8990fbd105d6\">What percentage of the sampled students fall into each category?<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ab783a0c6695f469dbae26992a2e5784a\">How are students divided across the three body image categories? Are they equally divided? If not, do the percentages follow some other kind of pattern?<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"b66d885462974431a9ff4422ad787aaa\">There is no way that we can answer these questions by looking at the raw data, which are in the form of a long list of 1,200 responses, and thus not very useful. However, both these questions will be easily answered once we summarize and look at the\u00a0<em class=\"italic\">distribution<\/em>\u00a0of the variable Body Image (i.e., once we summarize how often each of the categories occurs).<\/p>\r\n<p id=\"ab6d04e44e5d45b09d456bf0e63755d5\">In order to summarize the distribution of a categorical variable, we first create a table of the different values (categories) the variable takes, how many times each value occurs (count) and, more importantly, how often each value occurs (by converting the counts to percentages); this table is called a frequency distribution. Here is the frequency distribution for our example:<\/p>\r\n\r\n<table id=\"a9fb40630d43453db0dc8e77bc6c3f0c_bx\" class=\"table labeled\">\r\n<thead>\r\n<tr>\r\n<th>Body Image Distribution<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\"><\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>&nbsp;\r\n<table id=\"a9fb40630d43453db0dc8e77bc6c3f0c\" class=\"wbtable plain\" style=\"height: 135px;\">\r\n<thead>\r\n<tr style=\"height: 27px;\">\r\n<th style=\"height: 27px; width: 78.725px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aab5e9139fb104133b58e648acd2d1848\">category<\/p>\r\n<\/th>\r\n<th style=\"height: 27px; width: 40.75px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"af5df22733e3d43cdbaed6706d1043e2b\">Count<\/p>\r\n<\/th>\r\n<th style=\"height: 27px; width: 311.25px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab49b0b5b9dda492780471232d20a2626\">Percent<\/p>\r\n<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr style=\"height: 27px;\">\r\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"adfadaaf470244f98a86ce881508487c8\">About right<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aab825b4b284e472c961d7fab1be94b26\">855<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{855}{1200}\\right)\\times100=71.3% [\/latex]<\/td>\r\n<\/tr>\r\n<tr class=\"e\" style=\"height: 27px;\">\r\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ad07d01fb459347c4b7903d469bc6261b\">Overweight<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa3811b37691747f18dfbca1a5c4052d7\">235<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{235}{1200}\\right)\\times100=19.6%[\/latex]<\/td>\r\n<\/tr>\r\n<tr style=\"height: 27px;\">\r\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ab335cfce15d743538fb8acc94e4a5dea\">Underweight<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"acb5d578f8a9d44519c3b5d2e86b39900\">110<\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{110}{1200}\\right)\\times100=9.2%[\/latex]<\/td>\r\n<\/tr>\r\n<tr class=\"e\" style=\"height: 27px;\">\r\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"b33c9d57b32143e79950b35f201ef6d1\"><em class=\"italic\">Total<\/em><\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"bfbea44f32d44514ad266b1acec495f7\"><em class=\"italic\">n=1200<\/em><\/p>\r\n<\/td>\r\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"a162d5ffe2ba4caa9e3b536864983fa6\"><em class=\"italic\">100%<\/em><\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"b309adaa461a4fb8a1e3a7fe56a88157\">In order to visualize the numerical summaries we\u2019ve obtained, we need a graphical display. There are two simple graphical displays for visualizing the distribution of categorical data:<\/p>\r\n\r\n<ol id=\"abda342a14a44c419d7cbcece901ab7b\">\r\n \t<li>\r\n<p id=\"ac374642c375849bcb49d4ff7b9598731\">The Pie Chart<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fa916b9abbb9431388d5aa8be3a274d0\" class=\"img-responsive popimg aligncenter\" title=\"A pie chart of the distribution. Taking up 71.3% of the chart is the &amp;quot;about right&amp;quot; category, which is labeled with &amp;quot;about right (855, 71.3%)&amp;quot;. Another 9.2% of the chart os occupied by the section labeled &amp;quot;underweight (110, 9.2%)&amp;quot;, and taking up 19.6% of the chart is the area labeled &amp;quot;overweight (235, 19.6%)&amp;quot;. In total the three sections fill up the entire pie, so they make up 100% of the chart, which represents the entirety of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var1.gif\" alt=\"A pie chart of the distribution. Taking up 71.3% of the chart is the &amp;quot;about right&amp;quot; category, which is labeled with &amp;quot;about right (855, 71.3%)&amp;quot;. Another 9.2% of the chart os occupied by the section labeled &amp;quot;underweight (110, 9.2%)&amp;quot;, and taking up 19.6% of the chart is the area labeled &amp;quot;overweight (235, 19.6%)&amp;quot;. In total the three sections fill up the entire pie, so they make up 100% of the chart, which represents the entirety of the data.\" \/><\/span><\/span><\/li>\r\n \t<li>The Bar Chart\r\n<p id=\"ac2ae7e5cabb74d85ab0dfdfb9278441c\"><span class=\"imagewrap\"><span class=\"image\"><img id=\"ef776cb1541a4069a28b950dea8dcb9c\" class=\"img-responsive popimg aligncenter\" title=\"Two bar charts. Since these bar charts can only show one type of unit on the vertical axis, two are required, one to show counts and one to show percentages. The first bar chart shows counts on the vertical axis, from 0 to 900. The horizontal axis has 3 labels under 3 bars. The largest bar is labeled &amp;quot;about right&amp;quot; and is the largest. It extends from the 0 mark on the vertical axis to between the 800 and 900 mark. The second bar is labeled &amp;quot;overweight&amp;quot; and starts at the 0 mark and ends at about the 200 mark. The third bar is labeled &amp;quot;underweight&amp;quot; and starts at the 0 mark and ends between the 100 and 200 mark. The second bar chart is identical to the first one, except the vertical axis has been changed to Percent units, and goes from 0 to 70. The bars are the same as in the first chart.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var2.gif\" alt=\"Two bar charts. Since these bar charts can only show one type of unit on the vertical axis, two are required, one to show counts and one to show percentages. The first bar chart shows counts on the vertical axis, from 0 to 900. The horizontal axis has 3 labels under 3 bars. The largest bar is labeled &amp;quot;about right&amp;quot; and is the largest. It extends from the 0 mark on the vertical axis to between the 800 and 900 mark. The second bar is labeled &amp;quot;overweight&amp;quot; and starts at the 0 mark and ends at about the 200 mark. The third bar is labeled &amp;quot;underweight&amp;quot; and starts at the 0 mark and ends between the 100 and 200 mark. The second bar chart is identical to the first one, except the vertical axis has been changed to Percent units, and goes from 0 to 70. The bars are the same as in the first chart.\" \/><\/span><\/span><\/p>\r\n<\/li>\r\n<\/ol>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<p class=\"textbox__title\">Did I get this?<\/p>\r\n\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"12\"]\r\n\r\nNow that we have summarized the distribution of values in the Body Image variable, let's go back and interpret the results in the context of the questions that we posed:\r\n\r\n[h5p id=\"13\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<p id=\"d125fb9c155a48778c9c0fe9d672f12c\">Now that we've interpreted the results, there are some other interesting questions that arise:<\/p>\r\n\r\n<ul id=\"fbcf0af0011b4700b43cac6d02a9f1f8\">\r\n \t<li>\r\n<p id=\"aa99dcaeebbe64ae1b5c9324d51ab0056\">Can we reliably generalize our results to the entire population of interest and conclude that a similar distribution across body image categories exists among all U.S. college students? In particular, can we make such a generalization even though our sample consisted of only 1,200 students, which is a very small fraction of the entire population?<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af2dc42d13ad34e89855883d8a773efbd\">If we had separated our sample by gender and looked at males and females separately, would we have found a similar distribution across body image categories?<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"eaea6962a663463b9f271f515a27f826\">These are the types of questions that we will deal with in future sections of the course.<\/p>\r\n\r\n<div id=\"f0971b190bec4f1fb006fd91a77fbee7\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Comments<\/span><\/h2>\r\n<ol id=\"db72978a062640c9ada6ef8460b9d83e\">\r\n \t<li>\r\n<p id=\"e0b6496fee5443dd8b12536ed1204035\">While both the pie chart and the bar chart help us visualize the distribution of a categorical variable, the pie chart emphasizes how the different categories relate to the whole, and the bar chart emphasizes how the different categories compare with each other.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ca012d8d28ae44649c8dbd32e816f018\">A variation on the pie chart and bar chart that is very commonly used in the media is the\u00a0<em class=\"italic\">pictogram<\/em>. Here are two examples:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"fc8ff3bfd5734e10b4aabeacf98c1fe2\" class=\"img-responsive popimg aligncenter\" title=\"A bar chart in which the bars have been replaced by rolls of unraveled toilet paper. The chart is titled &amp;quot;How we flush a public toilet&amp;quot; The first bar is labeled &amp;quot;Use shoe, 41%&amp;quot;, the second bar is labeled &amp;quot;Act normally 30%&amp;quot;, and the last bar is labeled &amp;quot;Paper towel 17%&amp;quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var3.jpg\" alt=\"A bar chart in which the bars have been replaced by rolls of unraveled toilet paper. The chart is titled &amp;quot;How we flush a public toilet&amp;quot; The first bar is labeled &amp;quot;Use shoe, 41%&amp;quot;, the second bar is labeled &amp;quot;Act normally 30%&amp;quot;, and the last bar is labeled &amp;quot;Paper towel 17%&amp;quot;\" \/><\/span><\/span>\r\n<p id=\"ed85abcda61b4888b1f8845d3f7b641d\">Source: USA Today Snapshots and the Impulse Research for Northern Confidential Bathroom survey<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"d6f6d5bacdc541eb9377c547261c5a3c\" class=\"img-responsive popimg aligncenter\" title=\"A pie chart made out of a slice of cucumber. The cucumber is on a fork, which in turn is over a dinner table. The pie chart is titled &amp;quot;How often are salads eaten (per week)&amp;quot;. The pie chart shows 4 sections: Never (3%), Daily (13%), 2 or less (37%), 3-6 times (47%).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var4.jpg\" alt=\"A pie chart made out of a slice of cucumber. The cucumber is on a fork, which in turn is over a dinner table. The pie chart is titled &amp;quot;How often are salads eaten (per week)&amp;quot;. The pie chart shows 4 sections: Never (3%), Daily (13%), 2 or less (37%), 3-6 times (47%).\" \/><\/span><\/span>\r\n<p id=\"b2b144c9830c4827af54120708e69678\">Source: Market Facts for the Association of Dressings and Sauces<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"e60efc22f42b4710a4d1956442dd3981\"><em class=\"italic\">Beware:<\/em>\u00a0Pictograms can be misleading. Consider the following pictogram:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ab43bf06ac4f442681162e3938ff280e\" class=\"img-responsive popimg aligncenter\" title=\"A chart in which three items are represented by the size of a fountain pen. The chart is labeled &amp;quot;No. 1 for the Money with Consumer Services Advertisers&amp;quot; The smallest pen is U.S. News $1,537,617. The second smallest pen is Newsweek $2,698,386. The largest pen is TIME $4,433,879.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var5.jpg\" alt=\"A chart in which three items are represented by the size of a fountain pen. The chart is labeled &amp;quot;No. 1 for the Money with Consumer Services Advertisers&amp;quot; The smallest pen is U.S. News $1,537,617. The second smallest pen is Newsweek $2,698,386. The largest pen is TIME $4,433,879.\" \/><\/span><\/span>\r\n<p id=\"ef8253b1fb87419591ea6e7aac90a7c4\">This graph is aimed at advertisers deciding where to spend their budgets, and clearly suggests that\u00a0<em class=\"italic\">Time<\/em>\u00a0magazine attracts by far the largest amount of advertising spending. Are the differences really as dramatic as the graph suggests? If we look carefully at the numbers above the pens, we find that advertisers spend in\u00a0<em class=\"italic\">Time<\/em>\u00a0only $4,433,879 \/ $2,698,386 = 1.64 times more than in\u00a0<em class=\"italic\">Newsweek<\/em>, and only $4,433,879 \/ $1,537,617 = 2.88 times more than in\u00a0<em class=\"italic\">U.S. News<\/em>. By looking at the pictogram, however, we get the impression that\u00a0<em class=\"italic\">Time<\/em>\u00a0is much further ahead. Why? In order to magnify the picture without distorting it, we must increase\u00a0<em class=\"italic\">both<\/em>\u00a0its height and width. As a result, the\u00a0<em class=\"italic\">area<\/em>\u00a0of\u00a0<em class=\"italic\">Time\u2019s<\/em>\u00a0pen is 1.64 * 1.64 = 2.7 times larger than the\u00a0<em class=\"italic\">Newsweek<\/em>\u00a0pen, and 2.88 * 2.88 = 8.3 times larger than the\u00a0<em class=\"italic\">U.S. News<\/em>\u00a0pen. Our eyes capture the area of the pens rather than only the height, and so we are misled to think that\u00a0<em class=\"italic\">Time<\/em>\u00a0is a bigger winner than it really is.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<\/div>\r\n<\/div>\r\n<div class=\"activitywrap purpose learnbydoing wbactivity\">\r\n<h2><span style=\"color: #800080;\">Let\u2019s Summarize<\/span><\/h2>\r\n<\/div>\r\n<div id=\"c54fda04eef54dc1a369332fa1c4a0ba\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<ul id=\"e511ec6181824e61ada9094531c6b7f4\">\r\n \t<li>\r\n<p id=\"afc70ab5c43594db3b7696417ba1be701\">The distribution of a categorical variable is summarized using:<\/p>\r\n\r\n<ul id=\"eba3f6877c9c496dad87d40bcdb3413d\">\r\n \t<li>\r\n<p id=\"b8847c82abb044eea19694085c110d34\"><em class=\"italic\">Graphical display:<\/em>\u00a0pie chart or bar chart, supplemented by<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"cd35655682f64872a5e1006500a17948\"><em class=\"italic\">Numerical summaries:<\/em>\u00a0category counts and percentages.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ad0c0de81479243c6ab96ca678f92e7f3\">A variation on pie charts and bar charts is the pictogram.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"adbb8c6f365194e9ab18ee48d83f5a056\">Pictograms can be misleading, so make sure to use a critical approach when interpreting the information the pictogram is trying to convey.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<h2><span style=\"color: #800080;\">Organizing One Quantitative Variable<\/span><\/h2>\r\n<p id=\"c1be7527b17a41cc81f4f6a48da1c290\">In the previous section, we explored the distribution of a categorical variable using graphs (pie chart, bar chart) supplemented by numerical measures (percent of observations in each category). In this section, we will explore the data collected from a\u00a0<em>quantitative\u00a0<\/em>variable, and learn how to describe and summarize the important features of its distribution. We will first learn how to display the distribution using graphs and then move on to discuss numerical measures.<\/p>\r\nTo display data from one quantitative variable graphically, we can use either the\u00a0<em>histogram<\/em>\u00a0or the\u00a0<i>stem plot<\/i>. (Another graph, the\u00a0<em>boxplot<\/em>, will be mentioned later).\r\n<div id=\"lobjh\" class=\"\">\r\n<h3>Idea<\/h3>\r\n<\/div>\r\n<div id=\"N10B27\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"N10B2E\">Break the range of values into intervals and count how many observations fall into each interval.<\/p>\r\n\r\n<div class=\"examplewrap\">\r\n<div class=\"example clearfix\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4>Exam Grades<\/h4>\r\n<div>\r\n<p id=\"N10B36\">Here are the exam grades of 15 students:<\/p>\r\n\r\n<table class=\"formula\">\r\n<tbody>\r\n<tr>\r\n<td>88, 48, 60, 51, 57, 85, 69, 75, 97, 72, 71, 79, 65, 63, 73<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n&nbsp;\r\n<p id=\"N10B3D\">We first need to break the range of values into intervals (also called \u201cbins\u201d or \u201cclasses\u201d). In this case, since our dataset consists of exam scores, it will make sense to choose intervals that typically correspond to the range of a letter grade, 10 points wide: 40\u201350, 50\u201360, \u2026 90\u2013100. By counting how many of the 15 observations fall in each of the intervals, we get the following table:<\/p>\r\n\r\n<table id=\"N10B40_bx\" class=\"table labeled\">\r\n<thead>\r\n<tr>\r\n<th>Exam Grades<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table class=\"wbtable\">\r\n<thead>\r\n<tr>\r\n<th>Score<\/th>\r\n<th>Count<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>[40\u201350)<\/td>\r\n<td>1<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[50\u201360)<\/td>\r\n<td>2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>[60\u201370)<\/td>\r\n<td>4<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[70\u201380)<\/td>\r\n<td>5<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>[80\u201390)<\/td>\r\n<td>2<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[90\u2013100]<\/td>\r\n<td>1<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p id=\"N10B85\">Note: The observation 60 was counted in the 60\u201370 interval. See comment 1 below.<\/p>\r\nTo construct the histogram from this table we plot the intervals on the X-axis, and show the number of observations in each interval (frequency of the interval) on the Y-axis, which is represented by the height of a rectangle located above the interval:\r\n\r\n<\/div>\r\n<div>\r\n<div class=\"wp-nocaption alignnone wp-image-380 size-full\"><img class=\"wp-image-380 size-full aligncenter\" src=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2.png\" alt=\"\" width=\"1795\" height=\"851\" \/><\/div>\r\n<p id=\"N10B93\">The table above can also be turned into a relative frequency table using the following steps:<\/p>\r\n\r\n<ol class=\"decimal\">\r\n \t<li>Add a row on the bottom and include the total number of observations in the dataset that are represented in the table.<\/li>\r\n \t<li>Add a column, at the end of the table, and calculate the relative frequency for each interval, by dividing the number of observations in each row by the total number of observations.<\/li>\r\n<\/ol>\r\n<p id=\"N10BA0\">These two steps are illustrated in red in the following frequency distribution table:<\/p>\r\n<img id=\"_i_1\" class=\"aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/feq_table.png\" alt=\"\" width=\"750\" \/>\r\n<p id=\"N10BA8\">It is also possible to determine the number of scores for an interval, if you have the total number of observations and the relative frequency for that interval. For instance, suppose there are 15 scores (or observations) in a set of data and the relative frequency for an interval is .13. To determine the number of scores in that interval, multiplying the total number of observations by the relative frequency and round up to the next whole number: 15*.13 = 1.95, which rounds up to 2 observations.<\/p>\r\n<p id=\"N10BAB\">A relative frequency table, like the one above, can be used to determine the frequency of scores occurring at or across intervals. Here are some examples, using the above frequency table:<\/p>\r\n\r\n<ul class=\"decimal\">\r\n \t<li>What is the percentage of exam scores that were 70 and up to, but not including, 80?\r\n<ul class=\"decimal\">\r\n \t<li>To determine the answer, we look at the relative frequency associated with the [70-80) interval. The relative frequency is .33; to convert to percentage, multiply by 100 (.33*100= 33) or 33%.<\/li>\r\n<\/ul>\r\n<\/li>\r\n \t<li>What is the percentage of exam scores that are at least 70? To determine the answer, we need to:\r\n<ul class=\"decimal\">\r\n \t<li>Add together the relative frequencies for the intervals that have scores of at least 70 or above. Thus, would need to add together the relative frequencies from [70-80), [80-90), and [90-100] = .33+.13+.07 = .53.<\/li>\r\n \t<li>To get the percentage, need to multiple the calculated relative frequency by 100. In this case, it would be .53*100 = 53 or 53%.<\/li>\r\n<\/ul>\r\n<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N10BC9\">Here is the table from above, use it to answer the question.<\/p>\r\n\r\n<table id=\"N10BCC_bx\" class=\"table labeled \">\r\n<thead>\r\n<tr>\r\n<th>Exam Grades<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table class=\"wbtable \" cellspacing=\"0\" align=\"center\">\r\n<thead>\r\n<tr>\r\n<th>Score<\/th>\r\n<th>Count<\/th>\r\n<\/tr>\r\n<\/thead>\r\n<tbody>\r\n<tr>\r\n<td>[40\u201350)<\/td>\r\n<td>1<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[50\u201360)<\/td>\r\n<td>2<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>[60\u201370)<\/td>\r\n<td>4<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[70\u201380)<\/td>\r\n<td>5<\/td>\r\n<\/tr>\r\n<tr>\r\n<td>[80\u201390)<\/td>\r\n<td>2<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td>[90\u2013100]<\/td>\r\n<td>1<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n[h5p id=\"14\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Many students wonder...<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<strong>Question:<\/strong> How do I know what interval width to choose?\r\n\r\n<strong>Answer:<\/strong> There are no right or wrong choices of interval widths. In this course, we will rely on a statistical package to produce the histogram for us, and we will focus instead on describing and summarizing the distribution as it appears from the histogram.\r\n\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N1007C\">An instructor asked her students how much time (to the nearest hour) they spent studying for the midterm. The data are displayed in the following histogram:<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"histogram of time students spend studying\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram9.gif\" alt=\"histogram of time students spend studying\" \/><\/div>\r\n<div>[h5p id=\"15\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N1007C\">Thirty-two students were asked the number of servings of fruits and vegetables they eat daily. The results are displayed in the histogram below.<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"histogram of fruit and vegetable consumption\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/didigetthis_histogram1_02.gif\" alt=\"histogram of fruit and vegetable consumption\" \/><\/div>\r\n<div>[h5p id=\"16\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<p id=\"N1007C\">A survey was conducted to see how many phone calls people made daily. The results are displayed in the table below:<\/p>\r\n\r\n<div class=\"image shouldbeleft\"><img id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"table of calls made daily\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/didigetthis_histogram1_ep2.gif\" alt=\"table of calls made daily\" \/><\/div>\r\n<div>[h5p id=\"17\"]<\/div>\r\n<\/div>\r\n<\/div>\r\n<div>\r\n<div id=\"dc8ac8a1b70b466f8fd70cfe014c3e97\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Interpreting the Histogram<\/span><\/h2>\r\n<p id=\"beb021e50fd5468180a67ed771699bf0\">Once the distribution has been displayed graphically, we can describe the overall pattern of the distribution and mention any striking deviations from that pattern. More specifically, we should consider the following features of the distribution:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"b4e2ff04627446e880960f847d1245bc\" class=\"img-responsive popimg aligncenter\" title=\"The overall pattern of the distribution can be described by the shape, center, and spread of the histogram. Outliers in the distribution are deviations from the pattern.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram10.gif\" alt=\"The overall pattern of the distribution can be described by the shape, center, and spread of the histogram. Outliers in the distribution are deviations from the pattern.\" \/><\/span><\/span>\r\n<p id=\"efbc0dcb2c7b4d0fbc3cf24d15b73205\">We will get a sense of the overall pattern of the data from the histogram\u2019s center, spread and shape, while outliers will highlight deviations from that pattern.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"e95531b59c8c4b56b48a7bde1bec2e79\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3><strong><span style=\"color: #800080;\" title=\"Quick scroll up\">Distribution Shapes<\/span><\/strong><\/h3>\r\n<p id=\"b3b214032c344e798cd3389e2f814e3f\">When describing the shape of a distribution, we should consider:<\/p>\r\n\r\n<ol id=\"f77a34ce7f1b4d58bb59518130160b66\">\r\n \t<li>\r\n<p id=\"c7b5397efe644672b73c6a6bcd335083\"><em class=\"italic\">Symmetry\/skewness<\/em>\u00a0of the distribution.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"f7be218fe12f49c2ac8aa2e1b0f136f0\"><em class=\"italic\">Peakedness (modality)<\/em>\u2014the number of peaks (modes) the distribution has.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"b1cc64e3c34649daa9315f3a2cc99cf2\">We distinguish between:<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c03b988909814fba9822086b281fc212\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Symmetric Distributions<\/span><\/h2>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"a52de33e5d4a4816b0266b016e97cac2\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Single-peaked (Unimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to a peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram2.gif\" alt=\"A symmetric, Single-peaked (Unimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to a peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"ba15ccd0954243018c03e3d98e9a4bb3\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Double-peaked (Bimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to the first peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0 at roughly where x=15. The values increase again and peak at x=20, and then, continuing right, decrease to nearly 0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram3.gif\" alt=\"A symmetric, Double-peaked (Bimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to the first peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0 at roughly where x=15. The values increase again and peak at x=20, and then, continuing right, decrease to nearly 0.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img id=\"e7980f570e654c5db953807ec1b7d7e5\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Uniform distribution. Throughout the entire range of the x-axis the bars are roughly the same height, meaning they are the same value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram4.gif\" alt=\"A symmetric, Uniform distribution. Throughout the entire range of the x-axis the bars are roughly the same height, meaning they are the same value.\" \/><\/span><\/span>\r\n<p id=\"b76b97c18d484ee0aeaa6293116dabe5\">Note that all three distributions are symmetric, but are different in their modality (peakedness). The first distribution is\u00a0<em class=\"italic\">unimodal<\/em>\u2014it has one mode (roughly at 10) around which the observations are concentrated. The second distribution is\u00a0<em class=\"italic\">bimodal<\/em>\u2014it has two modes (roughly at 10 and 20) around which the observations are concentrated. The third distribution is kind of flat, or\u00a0<em class=\"italic\">uniform<\/em>. The distribution has no modes, or no value around which the observations are concentrated. Rather, we see that the observations are roughly uniformly distributed among the different values.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"aa2f9127d8604ca3abb4d8f6e6c653e6\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Skewed Right Distributions<\/span><\/h2>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e1971efb01d14d988175116a5ef10ad0\" class=\"img-responsive popimg aligncenter\" title=\"A Skewed-right histogram. As we proceed from left to right across the x-axis, the bars rapidly increase to the peak of the histogram, located at roughly x=33. From there, the values slowly decrease, and the last measurement is at x=200. The bars of the histogram are barely visible above the x-axis starting at about x=150.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram5.gif\" alt=\"A Skewed-right histogram. As we proceed from left to right across the x-axis, the bars rapidly increase to the peak of the histogram, located at roughly x=33. From there, the values slowly decrease, and the last measurement is at x=200. The bars of the histogram are barely visible above the x-axis starting at about x=150.\" \/><\/span><\/span>\r\n<p id=\"f69173e4b42448d2ac25c71075c7f9a9\">A distribution is called\u00a0<em class=\"italic\">skewed right<\/em>\u00a0if, as in the histogram above, the right tail (larger values) is much longer than the left tail (small values). Note that in a skewed right distribution, the bulk of the observations are small\/medium, with a few observations that are much larger than the rest. An example of a real-life variable that has a skewed right distribution is salary. Most people earn in the low\/medium range of salaries, with a few exceptions (CEOs, professional athletes etc.) that are distributed along a large range (long \u201ctail\u201d) of higher values.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"c2f3875842da402c8acd5aa32fed43ac\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Skewed Left Distributions<\/span><\/h2>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"e019677beb8e4f499535530b77a0e6ab\" class=\"img-responsive popimg aligncenter\" title=\"A Skewed-Left histogram. As we proceed from left to right across the x-axis, the bars rapidly slowly to the peak of the histogram, located at roughly x=78. From there, the values rapidly decrease, and the last measurement is at x=90. Since the X-axis starts at 0, the peak is offset to the right of the center of the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram6.gif\" alt=\"A Skewed-Left histogram. As we proceed from left to right across the x-axis, the bars rapidly slowly to the peak of the histogram, located at roughly x=78. From there, the values rapidly decrease, and the last measurement is at x=90. Since the X-axis starts at 0, the peak is offset to the right of the center of the histogram.\" \/><\/span><\/span>\r\n<p id=\"dadce93b96954eceae0f36f6e1f7dd42\">A distribution is called\u00a0<em class=\"italic\">skewed left<\/em>\u00a0if, as in the histogram above, the left tail (smaller values) is much longer than the right tail (larger values). Note that in a skewed left distribution, the bulk of the observations are medium\/large, with a few observations that are much smaller than the rest. An example of a real life variable that has a skewed left distribution is age of death from natural causes (heart disease, cancer etc.). Most such deaths happen at older ages, with fewer cases happening at younger ages.<\/p>\r\n\r\n<h3 id=\"d16d68627ee44637b7922509c473e222\"><span style=\"color: #800080;\"><strong>Comments<\/strong><\/span><\/h3>\r\n<ol id=\"e5d982a9a44d454fbaeb71b3042763b6\">\r\n \t<li>\r\n<p id=\"b6bccc28ede443b88b60decf183e28b5\">Note that skewed distributions can also be bimodal. Here is an example. A medium size neighborhood 24-hour convenience store collected data from 537 customers on the amount of money spend in a single visit to the store. The following histogram displays the data.<\/p>\r\n<p id=\"f4ed573b0b504493bf7e6a34e947069e\"><span class=\"imagewrap\"><span class=\"image\"><img id=\"f7c9cec70df14232bd5bf8e0de306102\" class=\"img-responsive popimg aligncenter\" title=\"A histogram in which the Y-axis is labeled with units in Frequency, from 0 to 70. The X-axis is labeled in Dollars Spent, from 0 to 105. Going from left to right on the X-axis, the bars of the histogram increase to a peak at x=25, where y=70. Then, the bars decrease, but at x=45 they begin to increase again, reaching a second peak at x=50, where y=37. Then, the values decrease until the end of the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/bimode_hist.png\" alt=\"A histogram in which the Y-axis is labeled with units in Frequency, from 0 to 70. The X-axis is labeled in Dollars Spent, from 0 to 105. Going from left to right on the X-axis, the bars of the histogram increase to a peak at x=25, where y=70. Then, the bars decrease, but at x=45 they begin to increase again, reaching a second peak at x=50, where y=37. Then, the values decrease until the end of the histogram.\" \/><\/span><\/span><\/p>\r\n<p id=\"c0855bc4ca8b44d0b3d01b1838c90ef8\">Note that the overall shape of the distribution is skewed to the right with a clear mode around $25. In addition it has another (smaller) \u201cpeak\u201d (mode) around $50-55. The majority of the customers spend around $25 but there is a cluster of customers who enter the store and spend around $50-55.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"d8e29f599c9e40959b9c5d70262904d8\">If a distribution has more than two modes, we say that the distribution is\u00a0<em class=\"bold\">multimodal<\/em>.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<p id=\"a7f4a9aa0aa74a13ba8305669ba99f38\">Recall our grades example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"c7c17a788c6a4f6aac61dde2717fdce6\" class=\"img-responsive popimg aligncenter\" title=\"The exam grades histogram from earlier.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram1.gif\" alt=\"The exam grades histogram from earlier.\" \/><\/span><\/span>\r\n<p id=\"e5a2fd72b40f4a4cb4eea6a7b75642c3\">As you can see from the histogram, the grades distribution is roughly symmetric.<\/p>\r\n\r\n<div id=\"bf4cc918d2f84f96a03e83c76a220354\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Center<\/span><\/h2>\r\n<p id=\"c791f7c7d7614583b9bec6428c8e8019\">The center of the distribution is its\u00a0<em class=\"italic\">midpoint<\/em>\u2014the value that divides the distribution so that approximately half the observations take smaller values, and approximately half the observations take larger values. Note that from looking at the histogram we can get only a rough estimate for the center of the distribution. (More exact ways of finding measures of center will be discussed in the next section.)<\/p>\r\n<p id=\"c62c7f34f4bf4b658e5044eccd6ba920\">Recall our grades example:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"afa86e794ae64d87800119430ebd3368\" class=\"img-responsive popimg aligncenter\" title=\"The exam grades histogram. The y-axis is labeled count, and the x-axis is labeled score. There is 1 count in 40-50 score interval, 2 counts in the 50-60 score interval, 4 counts in the 60-70 score interval, 5 counts in the 70-80 score interval, 2 counts in the 80-90 score interval, and 1 count in the 90-100 score interval.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram1.gif\" alt=\"The exam grades histogram. The y-axis is labeled count, and the x-axis is labeled score. There is 1 count in 40-50 score interval, 2 counts in the 50-60 score interval, 4 counts in the 60-70 score interval, 5 counts in the 70-80 score interval, 2 counts in the 80-90 score interval, and 1 count in the 90-100 score interval.\" \/><\/span><\/span>\r\n<p id=\"f9be01cc9b034e979ec8a7a552c4a976\">As you can see from the histogram, the center of the grades distribution is roughly 70 (7 students scored below 70, and 8 students scored above 70).<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"b25918491e9f42fe85d3907f9926cc7f\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Spread<\/span><\/h2>\r\n<p id=\"ab938edcb87741c78d1cc26e9e9e4691\">The\u00a0<em class=\"italic\">spread<\/em>\u00a0(also called\u00a0<em class=\"italic\">variability<\/em>) of the distribution can be described by the approximate range covered by the data. From looking at the histogram, we can approximate the smallest observation (<em class=\"italic\">min<\/em>), and the largest observation (<em class=\"italic\">max<\/em>), and thus approximate the range. (More exact ways of finding measures of spread are discussed in the next section.)<\/p>\r\n<p id=\"ad2343ed220d49109799ea0becfc94d7\">In our example:<\/p>\r\n\r\n<table id=\"f1701bc8cc384c1f9ca3314959577c74_bx\" class=\"table labeled\">\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\"><\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table id=\"f1701bc8cc384c1f9ca3314959577c74\" class=\"wbtable plain\">\r\n<tbody>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ac5a3844e162c42b888321680beecbe48\">Approximate min:<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ad84fd3380c044b17b38c4e101f8652c1\">45 (the middle of the lowest interval of scores)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"acf379f418fe4404ca8db69c1c705715c\">Approximate max:<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"aa7d902b66548488f82ddf541264b0558\">95 (the middle of the highest interval of scores)<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"abb4e235d702e402ebe8db9550899ad9d\">Approximate range:<\/p>\r\n<\/td>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"acb654b1a69c841c486567faff0894ad1\">95 \u2212 45 = 50<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n<div id=\"c9f7b594a2744ea7839ec64c46bd54c5\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Outliers<\/span><\/h2>\r\n<p id=\"de656c551e0d42c586220c2d6c8c5890\"><em class=\"italic\">Outliers<\/em>\u00a0are observations that fall outside the overall pattern. For example, the following histogram represents a distribution that has a high probable outlier:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"acf1cefefb33419198f599049e4fdf01\" class=\"img-responsive popimg aligncenter\" title=\"A histogram with frequency on the Y-axis. As we go from left to right on the x-axis, the frequency increases to a peak at x=5, then decreases. Eventually, we reach 0 at x=11. All of x &amp;gt; 10 have a frequency of 0, exception for x=15, which has a frequency of greater than zero. This is a outlier.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram7.gif\" alt=\"A histogram with frequency on the Y-axis. As we go from left to right on the x-axis, the frequency increases to a peak at x=5, then decreases. Eventually, we reach 0 at x=11. All of x &amp;gt; 10 have a frequency of 0, exception for x=15, which has a frequency of greater than zero. This is a outlier.\" \/><\/span><\/span>\r\n<p id=\"eb2406b7a227438396e03009a2c24540\">Go back and check the histogram of scores at the top of this page. As you can see, there are no outliers.<\/p>\r\n\r\n<div id=\"b9c0592a91b34a769e26d1f7253ac41c\" class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\r\n<div class=\"example clearfix\">\r\n<p id=\"f7167e2bc4fd4705be859c8bbd0d109d\">To provide an example of a histogram applied to actual data, we will look at the ages of Best Actress Oscar winners from 1970 to 2013 (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.)<\/p>\r\n<p id=\"b96b59af8ddb4ed696a55341bcc56368\"><a href=\"https:\/\/plot.ly\/~OLI_Stanford\/69.embed?link=false\" target=\"new\" rel=\"noopener\">Review the histogram for the data<\/a>.<\/p>\r\n<p id=\"deb7e9c7953741e5bb001741940551aa\">We will now summarize the main features of the distribution of ages as it appears from the histogram:<\/p>\r\n<p id=\"eb3d292bad7e4ab6804a8fc15969b207\"><em class=\"italic\">Shape:<\/em>\u00a0The distribution of ages is skewed right. We have a concentration of data among the younger ages and a long tail to the right. The vast majority of the \u201cbest actress\u201d awards are given to young actresses, with very few awards given to actresses who are older.<\/p>\r\n<p id=\"a47911ce2a94401e873e5157d0a50573\"><em class=\"italic\">Center:<\/em>\u00a0The data seem to be centered around 34 or 35 years old. Note that this implies that roughly half the awards are given to actresses who are less than 34 years old.<\/p>\r\n<p id=\"a324582bba1c41099eeee02c6887b3e5\"><em class=\"italic\">Spread:<\/em>\u00a0The data range from about 20 to about 80, so the approximate range equals 80 \u2013 20 = 60.<\/p>\r\n<p id=\"e5f1a12d0f4d4435a9b49c88d91652a0\"><em class=\"italic\">Outliers:<\/em>\u00a0There seem to be two probable outliers to the far right and possibly three around 62 years old.<\/p>\r\n<p id=\"d1d486e2d23b4acfb6073cc6e0d1d0ba\">You can see how informative it is to know \u201cwhat to look at\u201d in a histogram. If there is one conclusion that we can make here, it is that Hollywood likes to give Oscars to young actresses.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n[h5p id=\"18\"]\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<div class=\"example clearfix\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<ul id=\"a7f98e29719345f2982caaa290ab2012\">\r\n \t<li>\r\n<p id=\"af7c8cfe702134a49bfee440577edc30a\">The histogram is a graphical display of the distribution of a quantitative variable. It plots the number (count) of observations that fall in intervals of values.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"af6ac88ffd8cf4f73900c52495f63ef37\">When examining the distribution of a quantitative variable, one should describe the overall pattern of the data (shape, center, spread), and any deviations from the pattern (outliers).<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ae46a905f4bd041b78883a024fe4b55d8\">When describing the shape of a distribution, one should consider:<\/p>\r\n\r\n<ul id=\"b97405d248764e8894c3de3791801766\">\r\n \t<li>\r\n<p id=\"af89fa5c460c74d5aa6c32b5cfba2be0c\">Symmetry\/skewness of the distribution<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"aa3c8c08cc8344f9aac4bc5d171a7cece\">Peakedness (modality)\u2014the number of peaks (modes) the distribution has.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<p id=\"afbc5377f40294356a089c5d14af48878\">*Not all distributions have a simple, recognizable shape.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ad874d214e0dc4ce3b6188c8a404c0112\">Outliers are data points that fall outside the overall pattern of the distribution and need further research before continuing the analysis.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"ae116673fdc054b04bd6d39f4314d6493\">It is always important to interpret what the features of the distribution (as they appear in the histogram) mean in the context of the data.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<h2><span style=\"color: #800080;\">Stemplot<\/span><\/h2>\r\n<div class=\"\">\r\n\r\nThe stem plot (also called stem and leaf plot) is another graphical display of the distribution of quantitative data.\r\n\r\n<\/div>\r\n<div id=\"bf98171802b8430da609c0837be4117e\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h3><span title=\"Quick scroll up\">Idea<\/span><\/h3>\r\n<p id=\"c8504d1cf9784becbe085351f41002ad\">Separate each data point into a stem and leaf, as follows:<\/p>\r\n\r\n<table id=\"db8dc46a215f465496936f28cd48d7e6_bx\" class=\"table labeled\">\r\n<tfoot>\r\n<tr>\r\n<td class=\"captionwrap\"><\/td>\r\n<\/tr>\r\n<\/tfoot>\r\n<tbody>\r\n<tr>\r\n<td>\r\n<table id=\"db8dc46a215f465496936f28cd48d7e6\" class=\"wbtable plain\">\r\n<tbody>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"d64fe3b91ab0407f8874859621192470\">The leaf is the right-most digit.<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"d61356229ea74b95ada245d1161a4ed6\">The stem is everything except the right-most digit.<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr class=\"e\">\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"d324d4fa775344aa850300828b69bbcf\">So, if the data point is 34, then 3 is the stem and 4 is the leaf.<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\r\n<p id=\"ce682de7f8c24cff83a3a0039ef8fa26\">If the data point is 3.41, then 3.4 is the stem and 1 is the leaf.<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<\/div>\r\n<\/div>\r\n<div id=\"fa3a41b70e5b497aac0e6d920cba243b\" class=\"examplewrap\">\r\n<div class=\"exHead\">\r\n<div class=\"textbox textbox--examples\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Example<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n<div class=\"examplewrap\">\r\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\r\n<div class=\"example clearfix\">\r\n<div>\r\n<p id=\"db189709163244ee8700eef59008b7cf\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.)<\/p>\r\n<p id=\"bab2b1edf74b4fcc8e31e91c6dffee42\">34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/p>\r\n<p id=\"f3147ead358b405d89e2a7c17c6a2df2\"><em>To make a stem plot:<\/em><\/p>\r\n\r\n<ol id=\"a4955f64c31c47159d296e741b997424\">\r\n \t<li>\r\n<p id=\"f7f643df98b04f4e8215f92e37d5827e\">Separate each observation into a stem and a leaf.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b9775d5e5798480cb443f81dfa473a34\">Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"e0b245a1443f4f908d50631e57353a5d\">Go through the data points, and write each leaf in the row to the right of its stem.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b344d231b28b47519839e13511c55845\">Rearrange the leaves in an increasing order.<\/p>\r\n<\/li>\r\n<\/ol>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ea6701605ef5424581df709c38a8afe2\" class=\"img-responsive popimg aligncenter\" title=\"The result of steps 1, 2, and 3 on the given data set results in the following: first row: 2|7169658992 second row: 3|3376231383694355023 third row: 4|2119124954 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Step 4 results in: first row: 2|1256678999 second row: 3|0122333333445566789 third row: 4|1112244599 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Following the extra step(*): first row: 2|12 second row: 2|56678999 third row: 3|012233333344 fourth row: 3|5566789 fifth row: 4|1112244 sixth row: 4|599 seventh row: 5| eighth row: 5| ninth row: 6|112 tenth row: 7|4 eleventh row:7| twelfth row: 8|0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot.jpg\" alt=\"The result of steps 1, 2, and 3 on the given data set results in the following: first row: 2|7169658992 second row: 3|3376231383694355023 third row: 4|2119124954 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Step 4 results in: first row: 2|1256678999 second row: 3|0122333333445566789 third row: 4|1112244599 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Following the extra step(*): first row: 2|12 second row: 2|56678999 third row: 3|012233333344 fourth row: 3|5566789 fifth row: 4|1112244 sixth row: 4|599 seventh row: 5| eighth row: 5| ninth row: 6|112 tenth row: 7|4 eleventh row:7| twelfth row: 8|0\" \/><\/span><\/span>\r\n<p id=\"c1e0f443cb3b4c2388792dc75151a83b\">* When some of the stems hold a large number of leaves, we can split each stem into two: one holding the leaves 0-4, and the other holding the leaves 5-9. A statistical software package will often do the splitting for you, when appropriate.<\/p>\r\n<p id=\"ce6ad5a030954922b835b279f5acee40\"><em>Note\u00a0<\/em>that when rotated 90 degrees counterclockwise, the stem plot visually resembles a histogram:<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ee6ba663384840fca565fd33cd8b7923\" class=\"img-responsive popimg aligncenter\" title=\"A rotated stem plot. This is the same as the last stem plot given in the previous image, but rotated so that the stems are at the bottom, with the leaves on top.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot_rotated.jpg\" alt=\"A rotated stem plot. This is the same as the last stem plot given in the previous image, but rotated so that the stems are at the bottom, with the leaves on top.\" \/><\/span><\/span>\r\n<p id=\"e738b0fbab9a48c784177743928065f0\">This orientation makes the right-skewedness of the distribution clearly visible.<\/p>\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<p id=\"b92fdbb1e528448ba60cb2d27f06fc22\">The stem plot has additional unique features:<\/p>\r\n\r\n<ul id=\"e5e87270d7a44e12bf4fb095b8abc03f\">\r\n \t<li>\r\n<p id=\"a1f166ce8aa848009a0978de3fb76b20\">It preserves the original data.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"b40943fe1a924824a1d705b470f99f0e\">It sorts the data (which will become very useful in the next section).<\/p>\r\n<\/li>\r\n<\/ul>\r\n<h2><span style=\"color: #800080;\">Comment<\/span><\/h2>\r\n<\/div>\r\n<\/div>\r\n<div id=\"df18416eb2b3459a89e1d17317b51e06\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<p id=\"dffd73019194444184d8364c7aa085ab\">There is another type of display that we can use to summarize a quantitative variable graphically\u2014the\u00a0<em>dot plot<\/em>. The dot plot, like the stem plot, shows each observation, but displays it with a dot rather than with its actual value. Here is the dot plot for the ages of Best Actress Oscar winners.<\/p>\r\n<span class=\"imagewrap\"><span class=\"image\"><img id=\"ed46b761648e4c2e8c757cc94e61a38d\" class=\"img-responsive popimg aligncenter\" title=\"A dotplot titled &amp;quot;Dotplot of Age&amp;quot; A number line is at the bottom of the image, labeled in units of age from 24 to 80. At each age on the number line the a line of dots, each representing one winner of that age, appears above the place of that age on the number line.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_dotplot.jpg\" alt=\"A dotplot titled &amp;quot;Dotplot of Age&amp;quot; A number line is at the bottom of the image, labeled in units of age from 24 to 80. At each age on the number line the a line of dots, each representing one winner of that age, appears above the place of that age on the number line.\" width=\"800\" \/><\/span><\/span>\r\n\r\n<\/div>\r\n<\/div>\r\n<div id=\"fd2306ffbdf24c98966c4c26367a5aaa\" class=\"section\">\r\n<div class=\"sectionContain\">\r\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\r\n<p id=\"b45dd1a773af459d8c07c085b0b95860\">The stem plot is a simple but useful visual display of quantitative data. Its principal virtues are:<\/p>\r\n\r\n<ul id=\"f0a1f4f3902a47e5ad498551b9f77415\">\r\n \t<li>\r\n<p id=\"ed5c01e342e74ed69326cd93d93b0ed3\">Easy and quick to construct for small, simple datasets.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"fbd166c58daa442b815973be5c9c2cd7\">Retains the actual data.<\/p>\r\n<\/li>\r\n \t<li>\r\n<p id=\"c17f077ad7e34bd99b05c7acc1913fdd\">Sorts (ranks) the data.<\/p>\r\n<\/li>\r\n<\/ul>\r\n<div class=\"textbox textbox--exercises\"><header class=\"textbox__header\">\r\n<h3 class=\"textbox__title\">Many students wonder...<\/h3>\r\n<\/header>\r\n<div class=\"textbox__content\">\r\n\r\n<strong>Question:<\/strong> How do we know which graph to use: the histogram, stemplot, or dotplot?\r\n\r\n<strong>Answer:<\/strong> Since for the most part we are not going to deal with very small data sets in this course, we will generally display the distribution of a quantitative variable using a histogram generated by a statistical software package.\r\n\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>\r\n<\/div>","rendered":"<p>As indicated in the introduction, we will begin the Exploratory Data Analysis part of the course by exploring (or looking at) one variable at a time.<\/p>\n<p>As we saw in <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/chapter\/1-2-data-types-and-levels-of-measurement\/\">Data Types and Levels of Measurement<\/a>, the data for each variable are a long list of values (whether numerical or not), and are not very informative in that form. In order to convert these raw data into useful information we need to summarize and then examine the\u00a0<em>distribution<\/em>\u00a0of the variable. By\u00a0<em>distribution<\/em>\u00a0of a variable, we mean:<\/p>\n<ul>\n<li>what values the variable takes, and<\/li>\n<li>how often the variable takes those values.<\/li>\n<\/ul>\n<p>This chapter has two sections. We will first learn how to summarize and examine the distribution of a single categorical variable, and then do the same for a quantitative variable.<\/p>\n<h2><span style=\"color: #800080;\">Organizing One Categorical Variable<\/span><\/h2>\n<p id=\"ebc684a34ec14c99806984fa95c2fc4c\">What is your perception of your own body? Do you feel that you are overweight, underweight, or about right?<\/p>\n<p id=\"d237e9121f28484cba1c00fed8bd6157\">A random sample of 1,200 U.S. college students were asked this question as part of a larger survey. The following table shows part of the responses:<\/p>\n<table id=\"f015b12e7e6740a0a03a9830cd268dfe_bx\" class=\"table labeled\">\n<thead>\n<tr>\n<th>Body Image<\/th>\n<\/tr>\n<\/thead>\n<tfoot>\n<tr>\n<td class=\"captionwrap\"><\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>\n<table id=\"f015b12e7e6740a0a03a9830cd268dfe\" class=\"wbtable plain\">\n<thead>\n<tr>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"abfbeeaef537143978827d5090a8d3b95\">Student<\/p>\n<\/th>\n<th colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab91d2e3725044630b0329a323928a157\">Body Image<\/p>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa9f51764e276490caa7b986f90e24791\">student 25<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ac043b9fcf1434a00b9a1e3c44f33865c\">overweight<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aeed717a4023f42368f2288c3e1732920\">student 26<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ae2450789be0d48599a11a8f6bdc3174e\">about right<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa90e21f0004b468cb4c98a0a712986b7\">student 27<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"acde29c624aa64dc396827e23c46030b7\">underweight<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa0393977ae1c42d58b64f3d41592c6e3\">student 28<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aea1008255fe9494394a26a94cdd8fbbb\">about right<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"afd9dada760714c588dd0072424d6b2b7\">student 29<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ad06450ac245a44df8950ad069d01e675\">about right<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"f6eed5f74abb4c27a93ea601d88fc7ad\">Here is some information that would be interesting to get from these data:<\/p>\n<ul id=\"b6a4cebd0b2e418991e418bea82f4fe7\">\n<li>\n<p id=\"ad39c42534ad643df86cd8990fbd105d6\">What percentage of the sampled students fall into each category?<\/p>\n<\/li>\n<li>\n<p id=\"ab783a0c6695f469dbae26992a2e5784a\">How are students divided across the three body image categories? Are they equally divided? If not, do the percentages follow some other kind of pattern?<\/p>\n<\/li>\n<\/ul>\n<p id=\"b66d885462974431a9ff4422ad787aaa\">There is no way that we can answer these questions by looking at the raw data, which are in the form of a long list of 1,200 responses, and thus not very useful. However, both these questions will be easily answered once we summarize and look at the\u00a0<em class=\"italic\">distribution<\/em>\u00a0of the variable Body Image (i.e., once we summarize how often each of the categories occurs).<\/p>\n<p id=\"ab6d04e44e5d45b09d456bf0e63755d5\">In order to summarize the distribution of a categorical variable, we first create a table of the different values (categories) the variable takes, how many times each value occurs (count) and, more importantly, how often each value occurs (by converting the counts to percentages); this table is called a frequency distribution. Here is the frequency distribution for our example:<\/p>\n<table id=\"a9fb40630d43453db0dc8e77bc6c3f0c_bx\" class=\"table labeled\">\n<thead>\n<tr>\n<th>Body Image Distribution<\/th>\n<\/tr>\n<\/thead>\n<tfoot>\n<tr>\n<td class=\"captionwrap\"><\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>&nbsp;<\/p>\n<table id=\"a9fb40630d43453db0dc8e77bc6c3f0c\" class=\"wbtable plain\" style=\"height: 135px;\">\n<thead>\n<tr style=\"height: 27px;\">\n<th style=\"height: 27px; width: 78.725px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aab5e9139fb104133b58e648acd2d1848\">category<\/p>\n<\/th>\n<th style=\"height: 27px; width: 40.75px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"af5df22733e3d43cdbaed6706d1043e2b\">Count<\/p>\n<\/th>\n<th style=\"height: 27px; width: 311.25px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab49b0b5b9dda492780471232d20a2626\">Percent<\/p>\n<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr style=\"height: 27px;\">\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"adfadaaf470244f98a86ce881508487c8\">About right<\/p>\n<\/td>\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aab825b4b284e472c961d7fab1be94b26\">855<\/p>\n<\/td>\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{855}{1200}\\right)\\times100=71.3%[\/latex]<\/td>\n<\/tr>\n<tr class=\"e\" style=\"height: 27px;\">\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ad07d01fb459347c4b7903d469bc6261b\">Overweight<\/p>\n<\/td>\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa3811b37691747f18dfbca1a5c4052d7\">235<\/p>\n<\/td>\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{235}{1200}\\right)\\times100=19.6%[\/latex]<\/td>\n<\/tr>\n<tr style=\"height: 27px;\">\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ab335cfce15d743538fb8acc94e4a5dea\">Underweight<\/p>\n<\/td>\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"acb5d578f8a9d44519c3b5d2e86b39900\">110<\/p>\n<\/td>\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">[latex]\\left(\\frac{110}{1200}\\right)\\times100=9.2%[\/latex]<\/td>\n<\/tr>\n<tr class=\"e\" style=\"height: 27px;\">\n<td style=\"height: 27px; width: 79.125px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"b33c9d57b32143e79950b35f201ef6d1\"><em class=\"italic\">Total<\/em><\/p>\n<\/td>\n<td style=\"height: 27px; width: 41.55px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"bfbea44f32d44514ad266b1acec495f7\"><em class=\"italic\">n=1200<\/em><\/p>\n<\/td>\n<td style=\"height: 27px; width: 311.65px;\" colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"a162d5ffe2ba4caa9e3b536864983fa6\"><em class=\"italic\">100%<\/em><\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"b309adaa461a4fb8a1e3a7fe56a88157\">In order to visualize the numerical summaries we\u2019ve obtained, we need a graphical display. There are two simple graphical displays for visualizing the distribution of categorical data:<\/p>\n<ol id=\"abda342a14a44c419d7cbcece901ab7b\">\n<li>\n<p id=\"ac374642c375849bcb49d4ff7b9598731\">The Pie Chart<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fa916b9abbb9431388d5aa8be3a274d0\" class=\"img-responsive popimg aligncenter\" title=\"A pie chart of the distribution. Taking up 71.3% of the chart is the &amp;quot;about right&amp;quot; category, which is labeled with &amp;quot;about right (855, 71.3%)&amp;quot;. Another 9.2% of the chart os occupied by the section labeled &amp;quot;underweight (110, 9.2%)&amp;quot;, and taking up 19.6% of the chart is the area labeled &amp;quot;overweight (235, 19.6%)&amp;quot;. In total the three sections fill up the entire pie, so they make up 100% of the chart, which represents the entirety of the data.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var1.gif\" alt=\"A pie chart of the distribution. Taking up 71.3% of the chart is the &amp;quot;about right&amp;quot; category, which is labeled with &amp;quot;about right (855, 71.3%)&amp;quot;. Another 9.2% of the chart os occupied by the section labeled &amp;quot;underweight (110, 9.2%)&amp;quot;, and taking up 19.6% of the chart is the area labeled &amp;quot;overweight (235, 19.6%)&amp;quot;. In total the three sections fill up the entire pie, so they make up 100% of the chart, which represents the entirety of the data.\" \/><\/span><\/span><\/li>\n<li>The Bar Chart\n<p id=\"ac2ae7e5cabb74d85ab0dfdfb9278441c\"><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ef776cb1541a4069a28b950dea8dcb9c\" class=\"img-responsive popimg aligncenter\" title=\"Two bar charts. Since these bar charts can only show one type of unit on the vertical axis, two are required, one to show counts and one to show percentages. The first bar chart shows counts on the vertical axis, from 0 to 900. The horizontal axis has 3 labels under 3 bars. The largest bar is labeled &amp;quot;about right&amp;quot; and is the largest. It extends from the 0 mark on the vertical axis to between the 800 and 900 mark. The second bar is labeled &amp;quot;overweight&amp;quot; and starts at the 0 mark and ends at about the 200 mark. The third bar is labeled &amp;quot;underweight&amp;quot; and starts at the 0 mark and ends between the 100 and 200 mark. The second bar chart is identical to the first one, except the vertical axis has been changed to Percent units, and goes from 0 to 70. The bars are the same as in the first chart.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var2.gif\" alt=\"Two bar charts. Since these bar charts can only show one type of unit on the vertical axis, two are required, one to show counts and one to show percentages. The first bar chart shows counts on the vertical axis, from 0 to 900. The horizontal axis has 3 labels under 3 bars. The largest bar is labeled &amp;quot;about right&amp;quot; and is the largest. It extends from the 0 mark on the vertical axis to between the 800 and 900 mark. The second bar is labeled &amp;quot;overweight&amp;quot; and starts at the 0 mark and ends at about the 200 mark. The third bar is labeled &amp;quot;underweight&amp;quot; and starts at the 0 mark and ends between the 100 and 200 mark. The second bar chart is identical to the first one, except the vertical axis has been changed to Percent units, and goes from 0 to 70. The bars are the same as in the first chart.\" \/><\/span><\/span><\/p>\n<\/li>\n<\/ol>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<p class=\"textbox__title\">Did I get this?<\/p>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-12\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-12\" class=\"h5p-iframe\" data-content-id=\"12\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 Learn by doing 1\"><\/iframe><\/div>\n<\/div>\n<p>Now that we have summarized the distribution of values in the Body Image variable, let&#8217;s go back and interpret the results in the context of the questions that we posed:<\/p>\n<div id=\"h5p-13\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-13\" class=\"h5p-iframe\" data-content-id=\"13\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 Learn by doing 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"d125fb9c155a48778c9c0fe9d672f12c\">Now that we&#8217;ve interpreted the results, there are some other interesting questions that arise:<\/p>\n<ul id=\"fbcf0af0011b4700b43cac6d02a9f1f8\">\n<li>\n<p id=\"aa99dcaeebbe64ae1b5c9324d51ab0056\">Can we reliably generalize our results to the entire population of interest and conclude that a similar distribution across body image categories exists among all U.S. college students? In particular, can we make such a generalization even though our sample consisted of only 1,200 students, which is a very small fraction of the entire population?<\/p>\n<\/li>\n<li>\n<p id=\"af2dc42d13ad34e89855883d8a773efbd\">If we had separated our sample by gender and looked at males and females separately, would we have found a similar distribution across body image categories?<\/p>\n<\/li>\n<\/ul>\n<p id=\"eaea6962a663463b9f271f515a27f826\">These are the types of questions that we will deal with in future sections of the course.<\/p>\n<div id=\"f0971b190bec4f1fb006fd91a77fbee7\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Comments<\/span><\/h2>\n<ol id=\"db72978a062640c9ada6ef8460b9d83e\">\n<li>\n<p id=\"e0b6496fee5443dd8b12536ed1204035\">While both the pie chart and the bar chart help us visualize the distribution of a categorical variable, the pie chart emphasizes how the different categories relate to the whole, and the bar chart emphasizes how the different categories compare with each other.<\/p>\n<\/li>\n<li>\n<p id=\"ca012d8d28ae44649c8dbd32e816f018\">A variation on the pie chart and bar chart that is very commonly used in the media is the\u00a0<em class=\"italic\">pictogram<\/em>. Here are two examples:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"fc8ff3bfd5734e10b4aabeacf98c1fe2\" class=\"img-responsive popimg aligncenter\" title=\"A bar chart in which the bars have been replaced by rolls of unraveled toilet paper. The chart is titled &amp;quot;How we flush a public toilet&amp;quot; The first bar is labeled &amp;quot;Use shoe, 41%&amp;quot;, the second bar is labeled &amp;quot;Act normally 30%&amp;quot;, and the last bar is labeled &amp;quot;Paper towel 17%&amp;quot;\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var3.jpg\" alt=\"A bar chart in which the bars have been replaced by rolls of unraveled toilet paper. The chart is titled &amp;quot;How we flush a public toilet&amp;quot; The first bar is labeled &amp;quot;Use shoe, 41%&amp;quot;, the second bar is labeled &amp;quot;Act normally 30%&amp;quot;, and the last bar is labeled &amp;quot;Paper towel 17%&amp;quot;\" \/><\/span><\/span><\/p>\n<p id=\"ed85abcda61b4888b1f8845d3f7b641d\">Source: USA Today Snapshots and the Impulse Research for Northern Confidential Bathroom survey<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"d6f6d5bacdc541eb9377c547261c5a3c\" class=\"img-responsive popimg aligncenter\" title=\"A pie chart made out of a slice of cucumber. The cucumber is on a fork, which in turn is over a dinner table. The pie chart is titled &amp;quot;How often are salads eaten (per week)&amp;quot;. The pie chart shows 4 sections: Never (3%), Daily (13%), 2 or less (37%), 3-6 times (47%).\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var4.jpg\" alt=\"A pie chart made out of a slice of cucumber. The cucumber is on a fork, which in turn is over a dinner table. The pie chart is titled &amp;quot;How often are salads eaten (per week)&amp;quot;. The pie chart shows 4 sections: Never (3%), Daily (13%), 2 or less (37%), 3-6 times (47%).\" \/><\/span><\/span><\/p>\n<p id=\"b2b144c9830c4827af54120708e69678\">Source: Market Facts for the Association of Dressings and Sauces<\/p>\n<\/li>\n<li>\n<p id=\"e60efc22f42b4710a4d1956442dd3981\"><em class=\"italic\">Beware:<\/em>\u00a0Pictograms can be misleading. Consider the following pictogram:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ab43bf06ac4f442681162e3938ff280e\" class=\"img-responsive popimg aligncenter\" title=\"A chart in which three items are represented by the size of a fountain pen. The chart is labeled &amp;quot;No. 1 for the Money with Consumer Services Advertisers&amp;quot; The smallest pen is U.S. News $1,537,617. The second smallest pen is Newsweek $2,698,386. The largest pen is TIME $4,433,879.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/one_cat_var5.jpg\" alt=\"A chart in which three items are represented by the size of a fountain pen. The chart is labeled &amp;quot;No. 1 for the Money with Consumer Services Advertisers&amp;quot; The smallest pen is U.S. News $1,537,617. The second smallest pen is Newsweek $2,698,386. The largest pen is TIME $4,433,879.\" \/><\/span><\/span><\/p>\n<p id=\"ef8253b1fb87419591ea6e7aac90a7c4\">This graph is aimed at advertisers deciding where to spend their budgets, and clearly suggests that\u00a0<em class=\"italic\">Time<\/em>\u00a0magazine attracts by far the largest amount of advertising spending. Are the differences really as dramatic as the graph suggests? If we look carefully at the numbers above the pens, we find that advertisers spend in\u00a0<em class=\"italic\">Time<\/em>\u00a0only $4,433,879 \/ $2,698,386 = 1.64 times more than in\u00a0<em class=\"italic\">Newsweek<\/em>, and only $4,433,879 \/ $1,537,617 = 2.88 times more than in\u00a0<em class=\"italic\">U.S. News<\/em>. By looking at the pictogram, however, we get the impression that\u00a0<em class=\"italic\">Time<\/em>\u00a0is much further ahead. Why? In order to magnify the picture without distorting it, we must increase\u00a0<em class=\"italic\">both<\/em>\u00a0its height and width. As a result, the\u00a0<em class=\"italic\">area<\/em>\u00a0of\u00a0<em class=\"italic\">Time\u2019s<\/em>\u00a0pen is 1.64 * 1.64 = 2.7 times larger than the\u00a0<em class=\"italic\">Newsweek<\/em>\u00a0pen, and 2.88 * 2.88 = 8.3 times larger than the\u00a0<em class=\"italic\">U.S. News<\/em>\u00a0pen. Our eyes capture the area of the pens rather than only the height, and so we are misled to think that\u00a0<em class=\"italic\">Time<\/em>\u00a0is a bigger winner than it really is.<\/p>\n<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<div class=\"activitywrap purpose learnbydoing wbactivity\">\n<h2><span style=\"color: #800080;\">Let\u2019s Summarize<\/span><\/h2>\n<\/div>\n<div id=\"c54fda04eef54dc1a369332fa1c4a0ba\" class=\"section\">\n<div class=\"sectionContain\">\n<ul id=\"e511ec6181824e61ada9094531c6b7f4\">\n<li>\n<p id=\"afc70ab5c43594db3b7696417ba1be701\">The distribution of a categorical variable is summarized using:<\/p>\n<ul id=\"eba3f6877c9c496dad87d40bcdb3413d\">\n<li>\n<p id=\"b8847c82abb044eea19694085c110d34\"><em class=\"italic\">Graphical display:<\/em>\u00a0pie chart or bar chart, supplemented by<\/p>\n<\/li>\n<li>\n<p id=\"cd35655682f64872a5e1006500a17948\"><em class=\"italic\">Numerical summaries:<\/em>\u00a0category counts and percentages.<\/p>\n<\/li>\n<\/ul>\n<\/li>\n<li>\n<p id=\"ad0c0de81479243c6ab96ca678f92e7f3\">A variation on pie charts and bar charts is the pictogram.<\/p>\n<\/li>\n<li>\n<p id=\"adbb8c6f365194e9ab18ee48d83f5a056\">Pictograms can be misleading, so make sure to use a critical approach when interpreting the information the pictogram is trying to convey.<\/p>\n<\/li>\n<\/ul>\n<h2><span style=\"color: #800080;\">Organizing One Quantitative Variable<\/span><\/h2>\n<p id=\"c1be7527b17a41cc81f4f6a48da1c290\">In the previous section, we explored the distribution of a categorical variable using graphs (pie chart, bar chart) supplemented by numerical measures (percent of observations in each category). In this section, we will explore the data collected from a\u00a0<em>quantitative\u00a0<\/em>variable, and learn how to describe and summarize the important features of its distribution. We will first learn how to display the distribution using graphs and then move on to discuss numerical measures.<\/p>\n<p>To display data from one quantitative variable graphically, we can use either the\u00a0<em>histogram<\/em>\u00a0or the\u00a0<i>stem plot<\/i>. (Another graph, the\u00a0<em>boxplot<\/em>, will be mentioned later).<\/p>\n<div id=\"lobjh\" class=\"\">\n<h3>Idea<\/h3>\n<\/div>\n<div id=\"N10B27\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"N10B2E\">Break the range of values into intervals and count how many observations fall into each interval.<\/p>\n<div class=\"examplewrap\">\n<div class=\"example clearfix\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4>Exam Grades<\/h4>\n<div>\n<p id=\"N10B36\">Here are the exam grades of 15 students:<\/p>\n<table class=\"formula\">\n<tbody>\n<tr>\n<td>88, 48, 60, 51, 57, 85, 69, 75, 97, 72, 71, 79, 65, 63, 73<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p id=\"N10B3D\">We first need to break the range of values into intervals (also called \u201cbins\u201d or \u201cclasses\u201d). In this case, since our dataset consists of exam scores, it will make sense to choose intervals that typically correspond to the range of a letter grade, 10 points wide: 40\u201350, 50\u201360, \u2026 90\u2013100. By counting how many of the 15 observations fall in each of the intervals, we get the following table:<\/p>\n<table id=\"N10B40_bx\" class=\"table labeled\">\n<thead>\n<tr>\n<th>Exam Grades<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<table class=\"wbtable\">\n<thead>\n<tr>\n<th>Score<\/th>\n<th>Count<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>[40\u201350)<\/td>\n<td>1<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[50\u201360)<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>[60\u201370)<\/td>\n<td>4<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[70\u201380)<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>[80\u201390)<\/td>\n<td>2<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[90\u2013100]<\/td>\n<td>1<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p id=\"N10B85\">Note: The observation 60 was counted in the 60\u201370 interval. See comment 1 below.<\/p>\n<p>To construct the histogram from this table we plot the intervals on the X-axis, and show the number of observations in each interval (frequency of the interval) on the Y-axis, which is represented by the height of a rectangle located above the interval:<\/p>\n<\/div>\n<div>\n<div class=\"wp-nocaption alignnone wp-image-380 size-full\"><img loading=\"lazy\" decoding=\"async\" class=\"wp-image-380 size-full aligncenter\" src=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2.png\" alt=\"\" width=\"1795\" height=\"851\" srcset=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2.png 1795w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-300x142.png 300w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-1024x485.png 1024w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-768x364.png 768w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-1536x728.png 1536w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-65x31.png 65w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-225x107.png 225w, https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2024\/09\/newplot-2-350x166.png 350w\" sizes=\"auto, (max-width: 1795px) 100vw, 1795px\" \/><\/div>\n<p id=\"N10B93\">The table above can also be turned into a relative frequency table using the following steps:<\/p>\n<ol class=\"decimal\">\n<li>Add a row on the bottom and include the total number of observations in the dataset that are represented in the table.<\/li>\n<li>Add a column, at the end of the table, and calculate the relative frequency for each interval, by dividing the number of observations in each row by the total number of observations.<\/li>\n<\/ol>\n<p id=\"N10BA0\">These two steps are illustrated in red in the following frequency distribution table:<\/p>\n<p><img decoding=\"async\" id=\"_i_1\" class=\"aligncenter\" title=\"\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/feq_table.png\" alt=\"\" width=\"750\" \/><\/p>\n<p id=\"N10BA8\">It is also possible to determine the number of scores for an interval, if you have the total number of observations and the relative frequency for that interval. For instance, suppose there are 15 scores (or observations) in a set of data and the relative frequency for an interval is .13. To determine the number of scores in that interval, multiplying the total number of observations by the relative frequency and round up to the next whole number: 15*.13 = 1.95, which rounds up to 2 observations.<\/p>\n<p id=\"N10BAB\">A relative frequency table, like the one above, can be used to determine the frequency of scores occurring at or across intervals. Here are some examples, using the above frequency table:<\/p>\n<ul class=\"decimal\">\n<li>What is the percentage of exam scores that were 70 and up to, but not including, 80?\n<ul class=\"decimal\">\n<li>To determine the answer, we look at the relative frequency associated with the [70-80) interval. The relative frequency is .33; to convert to percentage, multiply by 100 (.33*100= 33) or 33%.<\/li>\n<\/ul>\n<\/li>\n<li>What is the percentage of exam scores that are at least 70? To determine the answer, we need to:\n<ul class=\"decimal\">\n<li>Add together the relative frequencies for the intervals that have scores of at least 70 or above. Thus, would need to add together the relative frequencies from [70-80), [80-90), and [90-100] = .33+.13+.07 = .53.<\/li>\n<li>To get the percentage, need to multiple the calculated relative frequency by 100. In this case, it would be .53*100 = 53 or 53%.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Learn by Doing<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N10BC9\">Here is the table from above, use it to answer the question.<\/p>\n<table id=\"N10BCC_bx\" class=\"table labeled\">\n<thead>\n<tr>\n<th>Exam Grades<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>\n<table class=\"wbtable\" style=\"border-spacing: 0px; margin: auto;\">\n<thead>\n<tr>\n<th>Score<\/th>\n<th>Count<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>[40\u201350)<\/td>\n<td>1<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[50\u201360)<\/td>\n<td>2<\/td>\n<\/tr>\n<tr>\n<td>[60\u201370)<\/td>\n<td>4<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[70\u201380)<\/td>\n<td>5<\/td>\n<\/tr>\n<tr>\n<td>[80\u201390)<\/td>\n<td>2<\/td>\n<\/tr>\n<tr class=\"e\">\n<td>[90\u2013100]<\/td>\n<td>1<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<div id=\"h5p-14\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-14\" class=\"h5p-iframe\" data-content-id=\"14\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 Learn by doing 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Many students wonder&#8230;<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p><strong>Question:<\/strong> How do I know what interval width to choose?<\/p>\n<p><strong>Answer:<\/strong> There are no right or wrong choices of interval widths. In this course, we will rely on a statistical package to produce the histogram for us, and we will focus instead on describing and summarizing the distribution as it appears from the histogram.<\/p>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p id=\"N1007C\">An instructor asked her students how much time (to the nearest hour) they spent studying for the midterm. The data are displayed in the following histogram:<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" id=\"N1007E\" class=\"img-responsive popimg aligncenter\" title=\"histogram of time students spend studying\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram9.gif\" alt=\"histogram of time students spend studying\" \/><\/div>\n<div>\n<div id=\"h5p-15\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-15\" class=\"h5p-iframe\" data-content-id=\"15\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 did I get this 1\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>Thirty-two students were asked the number of servings of fruits and vegetables they eat daily. The results are displayed in the histogram below.<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"histogram of fruit and vegetable consumption\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/didigetthis_histogram1_02.gif\" alt=\"histogram of fruit and vegetable consumption\" \/><\/div>\n<div>\n<div id=\"h5p-16\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-16\" class=\"h5p-iframe\" data-content-id=\"16\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 did I get this 2\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p>A survey was conducted to see how many phone calls people made daily. The results are displayed in the table below:<\/p>\n<div class=\"image shouldbeleft\"><img decoding=\"async\" class=\"img-responsive popimg aligncenter\" title=\"table of calls made daily\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/didigetthis_histogram1_ep2.gif\" alt=\"table of calls made daily\" \/><\/div>\n<div>\n<div id=\"h5p-17\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-17\" class=\"h5p-iframe\" data-content-id=\"17\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 did I get this 3\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div>\n<div id=\"dc8ac8a1b70b466f8fd70cfe014c3e97\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Interpreting the Histogram<\/span><\/h2>\n<p id=\"beb021e50fd5468180a67ed771699bf0\">Once the distribution has been displayed graphically, we can describe the overall pattern of the distribution and mention any striking deviations from that pattern. More specifically, we should consider the following features of the distribution:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"b4e2ff04627446e880960f847d1245bc\" class=\"img-responsive popimg aligncenter\" title=\"The overall pattern of the distribution can be described by the shape, center, and spread of the histogram. Outliers in the distribution are deviations from the pattern.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram10.gif\" alt=\"The overall pattern of the distribution can be described by the shape, center, and spread of the histogram. Outliers in the distribution are deviations from the pattern.\" \/><\/span><\/span><\/p>\n<p id=\"efbc0dcb2c7b4d0fbc3cf24d15b73205\">We will get a sense of the overall pattern of the data from the histogram\u2019s center, spread and shape, while outliers will highlight deviations from that pattern.<\/p>\n<\/div>\n<\/div>\n<div id=\"e95531b59c8c4b56b48a7bde1bec2e79\" class=\"section\">\n<div class=\"sectionContain\">\n<h3><strong><span style=\"color: #800080;\" title=\"Quick scroll up\">Distribution Shapes<\/span><\/strong><\/h3>\n<p id=\"b3b214032c344e798cd3389e2f814e3f\">When describing the shape of a distribution, we should consider:<\/p>\n<ol id=\"f77a34ce7f1b4d58bb59518130160b66\">\n<li>\n<p id=\"c7b5397efe644672b73c6a6bcd335083\"><em class=\"italic\">Symmetry\/skewness<\/em>\u00a0of the distribution.<\/p>\n<\/li>\n<li>\n<p id=\"f7be218fe12f49c2ac8aa2e1b0f136f0\"><em class=\"italic\">Peakedness (modality)<\/em>\u2014the number of peaks (modes) the distribution has.<\/p>\n<\/li>\n<\/ol>\n<p id=\"b1cc64e3c34649daa9315f3a2cc99cf2\">We distinguish between:<\/p>\n<\/div>\n<\/div>\n<div id=\"c03b988909814fba9822086b281fc212\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Symmetric Distributions<\/span><\/h2>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"a52de33e5d4a4816b0266b016e97cac2\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Single-peaked (Unimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to a peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram2.gif\" alt=\"A symmetric, Single-peaked (Unimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to a peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ba15ccd0954243018c03e3d98e9a4bb3\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Double-peaked (Bimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to the first peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0 at roughly where x=15. The values increase again and peak at x=20, and then, continuing right, decrease to nearly 0.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram3.gif\" alt=\"A symmetric, Double-peaked (Bimodal) distribution. The histogram&amp;apos;s bars start at low values close to 0 on the left and rise to the first peak where the x-axis is labeled 10. Then, the values decrease as we go right, back down to nearly 0 at roughly where x=15. The values increase again and peak at x=20, and then, continuing right, decrease to nearly 0.\" \/><\/span><\/span><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e7980f570e654c5db953807ec1b7d7e5\" class=\"img-responsive popimg aligncenter\" title=\"A symmetric, Uniform distribution. Throughout the entire range of the x-axis the bars are roughly the same height, meaning they are the same value.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram4.gif\" alt=\"A symmetric, Uniform distribution. Throughout the entire range of the x-axis the bars are roughly the same height, meaning they are the same value.\" \/><\/span><\/span><\/p>\n<p id=\"b76b97c18d484ee0aeaa6293116dabe5\">Note that all three distributions are symmetric, but are different in their modality (peakedness). The first distribution is\u00a0<em class=\"italic\">unimodal<\/em>\u2014it has one mode (roughly at 10) around which the observations are concentrated. The second distribution is\u00a0<em class=\"italic\">bimodal<\/em>\u2014it has two modes (roughly at 10 and 20) around which the observations are concentrated. The third distribution is kind of flat, or\u00a0<em class=\"italic\">uniform<\/em>. The distribution has no modes, or no value around which the observations are concentrated. Rather, we see that the observations are roughly uniformly distributed among the different values.<\/p>\n<\/div>\n<\/div>\n<div id=\"aa2f9127d8604ca3abb4d8f6e6c653e6\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Skewed Right Distributions<\/span><\/h2>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e1971efb01d14d988175116a5ef10ad0\" class=\"img-responsive popimg aligncenter\" title=\"A Skewed-right histogram. As we proceed from left to right across the x-axis, the bars rapidly increase to the peak of the histogram, located at roughly x=33. From there, the values slowly decrease, and the last measurement is at x=200. The bars of the histogram are barely visible above the x-axis starting at about x=150.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram5.gif\" alt=\"A Skewed-right histogram. As we proceed from left to right across the x-axis, the bars rapidly increase to the peak of the histogram, located at roughly x=33. From there, the values slowly decrease, and the last measurement is at x=200. The bars of the histogram are barely visible above the x-axis starting at about x=150.\" \/><\/span><\/span><\/p>\n<p id=\"f69173e4b42448d2ac25c71075c7f9a9\">A distribution is called\u00a0<em class=\"italic\">skewed right<\/em>\u00a0if, as in the histogram above, the right tail (larger values) is much longer than the left tail (small values). Note that in a skewed right distribution, the bulk of the observations are small\/medium, with a few observations that are much larger than the rest. An example of a real-life variable that has a skewed right distribution is salary. Most people earn in the low\/medium range of salaries, with a few exceptions (CEOs, professional athletes etc.) that are distributed along a large range (long \u201ctail\u201d) of higher values.<\/p>\n<\/div>\n<\/div>\n<div id=\"c2f3875842da402c8acd5aa32fed43ac\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Skewed Left Distributions<\/span><\/h2>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"e019677beb8e4f499535530b77a0e6ab\" class=\"img-responsive popimg aligncenter\" title=\"A Skewed-Left histogram. As we proceed from left to right across the x-axis, the bars rapidly slowly to the peak of the histogram, located at roughly x=78. From there, the values rapidly decrease, and the last measurement is at x=90. Since the X-axis starts at 0, the peak is offset to the right of the center of the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram6.gif\" alt=\"A Skewed-Left histogram. As we proceed from left to right across the x-axis, the bars rapidly slowly to the peak of the histogram, located at roughly x=78. From there, the values rapidly decrease, and the last measurement is at x=90. Since the X-axis starts at 0, the peak is offset to the right of the center of the histogram.\" \/><\/span><\/span><\/p>\n<p id=\"dadce93b96954eceae0f36f6e1f7dd42\">A distribution is called\u00a0<em class=\"italic\">skewed left<\/em>\u00a0if, as in the histogram above, the left tail (smaller values) is much longer than the right tail (larger values). Note that in a skewed left distribution, the bulk of the observations are medium\/large, with a few observations that are much smaller than the rest. An example of a real life variable that has a skewed left distribution is age of death from natural causes (heart disease, cancer etc.). Most such deaths happen at older ages, with fewer cases happening at younger ages.<\/p>\n<h3 id=\"d16d68627ee44637b7922509c473e222\"><span style=\"color: #800080;\"><strong>Comments<\/strong><\/span><\/h3>\n<ol id=\"e5d982a9a44d454fbaeb71b3042763b6\">\n<li>\n<p id=\"b6bccc28ede443b88b60decf183e28b5\">Note that skewed distributions can also be bimodal. Here is an example. A medium size neighborhood 24-hour convenience store collected data from 537 customers on the amount of money spend in a single visit to the store. The following histogram displays the data.<\/p>\n<p id=\"f4ed573b0b504493bf7e6a34e947069e\"><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"f7c9cec70df14232bd5bf8e0de306102\" class=\"img-responsive popimg aligncenter\" title=\"A histogram in which the Y-axis is labeled with units in Frequency, from 0 to 70. The X-axis is labeled in Dollars Spent, from 0 to 105. Going from left to right on the X-axis, the bars of the histogram increase to a peak at x=25, where y=70. Then, the bars decrease, but at x=45 they begin to increase again, reaching a second peak at x=50, where y=37. Then, the values decrease until the end of the histogram.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/bimode_hist.png\" alt=\"A histogram in which the Y-axis is labeled with units in Frequency, from 0 to 70. The X-axis is labeled in Dollars Spent, from 0 to 105. Going from left to right on the X-axis, the bars of the histogram increase to a peak at x=25, where y=70. Then, the bars decrease, but at x=45 they begin to increase again, reaching a second peak at x=50, where y=37. Then, the values decrease until the end of the histogram.\" \/><\/span><\/span><\/p>\n<p id=\"c0855bc4ca8b44d0b3d01b1838c90ef8\">Note that the overall shape of the distribution is skewed to the right with a clear mode around $25. In addition it has another (smaller) \u201cpeak\u201d (mode) around $50-55. The majority of the customers spend around $25 but there is a cluster of customers who enter the store and spend around $50-55.<\/p>\n<\/li>\n<li>\n<p id=\"d8e29f599c9e40959b9c5d70262904d8\">If a distribution has more than two modes, we say that the distribution is\u00a0<em class=\"bold\">multimodal<\/em>.<\/p>\n<\/li>\n<\/ol>\n<p id=\"a7f4a9aa0aa74a13ba8305669ba99f38\">Recall our grades example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"c7c17a788c6a4f6aac61dde2717fdce6\" class=\"img-responsive popimg aligncenter\" title=\"The exam grades histogram from earlier.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram1.gif\" alt=\"The exam grades histogram from earlier.\" \/><\/span><\/span><\/p>\n<p id=\"e5a2fd72b40f4a4cb4eea6a7b75642c3\">As you can see from the histogram, the grades distribution is roughly symmetric.<\/p>\n<div id=\"bf4cc918d2f84f96a03e83c76a220354\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Center<\/span><\/h2>\n<p id=\"c791f7c7d7614583b9bec6428c8e8019\">The center of the distribution is its\u00a0<em class=\"italic\">midpoint<\/em>\u2014the value that divides the distribution so that approximately half the observations take smaller values, and approximately half the observations take larger values. Note that from looking at the histogram we can get only a rough estimate for the center of the distribution. (More exact ways of finding measures of center will be discussed in the next section.)<\/p>\n<p id=\"c62c7f34f4bf4b658e5044eccd6ba920\">Recall our grades example:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"afa86e794ae64d87800119430ebd3368\" class=\"img-responsive popimg aligncenter\" title=\"The exam grades histogram. The y-axis is labeled count, and the x-axis is labeled score. There is 1 count in 40-50 score interval, 2 counts in the 50-60 score interval, 4 counts in the 60-70 score interval, 5 counts in the 70-80 score interval, 2 counts in the 80-90 score interval, and 1 count in the 90-100 score interval.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram1.gif\" alt=\"The exam grades histogram. The y-axis is labeled count, and the x-axis is labeled score. There is 1 count in 40-50 score interval, 2 counts in the 50-60 score interval, 4 counts in the 60-70 score interval, 5 counts in the 70-80 score interval, 2 counts in the 80-90 score interval, and 1 count in the 90-100 score interval.\" \/><\/span><\/span><\/p>\n<p id=\"f9be01cc9b034e979ec8a7a552c4a976\">As you can see from the histogram, the center of the grades distribution is roughly 70 (7 students scored below 70, and 8 students scored above 70).<\/p>\n<\/div>\n<\/div>\n<div id=\"b25918491e9f42fe85d3907f9926cc7f\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Spread<\/span><\/h2>\n<p id=\"ab938edcb87741c78d1cc26e9e9e4691\">The\u00a0<em class=\"italic\">spread<\/em>\u00a0(also called\u00a0<em class=\"italic\">variability<\/em>) of the distribution can be described by the approximate range covered by the data. From looking at the histogram, we can approximate the smallest observation (<em class=\"italic\">min<\/em>), and the largest observation (<em class=\"italic\">max<\/em>), and thus approximate the range. (More exact ways of finding measures of spread are discussed in the next section.)<\/p>\n<p id=\"ad2343ed220d49109799ea0becfc94d7\">In our example:<\/p>\n<table id=\"f1701bc8cc384c1f9ca3314959577c74_bx\" class=\"table labeled\">\n<tfoot>\n<tr>\n<td class=\"captionwrap\"><\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>\n<table id=\"f1701bc8cc384c1f9ca3314959577c74\" class=\"wbtable plain\">\n<tbody>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ac5a3844e162c42b888321680beecbe48\">Approximate min:<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ad84fd3380c044b17b38c4e101f8652c1\">45 (the middle of the lowest interval of scores)<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"acf379f418fe4404ca8db69c1c705715c\">Approximate max:<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"aa7d902b66548488f82ddf541264b0558\">95 (the middle of the highest interval of scores)<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"abb4e235d702e402ebe8db9550899ad9d\">Approximate range:<\/p>\n<\/td>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"acb654b1a69c841c486567faff0894ad1\">95 \u2212 45 = 50<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<div id=\"c9f7b594a2744ea7839ec64c46bd54c5\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Outliers<\/span><\/h2>\n<p id=\"de656c551e0d42c586220c2d6c8c5890\"><em class=\"italic\">Outliers<\/em>\u00a0are observations that fall outside the overall pattern. For example, the following histogram represents a distribution that has a high probable outlier:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"acf1cefefb33419198f599049e4fdf01\" class=\"img-responsive popimg aligncenter\" title=\"A histogram with frequency on the Y-axis. As we go from left to right on the x-axis, the frequency increases to a peak at x=5, then decreases. Eventually, we reach 0 at x=11. All of x &amp;gt; 10 have a frequency of 0, exception for x=15, which has a frequency of greater than zero. This is a outlier.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/histogram7.gif\" alt=\"A histogram with frequency on the Y-axis. As we go from left to right on the x-axis, the frequency increases to a peak at x=5, then decreases. Eventually, we reach 0 at x=11. All of x &amp;gt; 10 have a frequency of 0, exception for x=15, which has a frequency of greater than zero. This is a outlier.\" \/><\/span><\/span><\/p>\n<p id=\"eb2406b7a227438396e03009a2c24540\">Go back and check the histogram of scores at the top of this page. As you can see, there are no outliers.<\/p>\n<div id=\"b9c0592a91b34a769e26d1f7253ac41c\" class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\n<div class=\"example clearfix\">\n<p id=\"f7167e2bc4fd4705be859c8bbd0d109d\">To provide an example of a histogram applied to actual data, we will look at the ages of Best Actress Oscar winners from 1970 to 2013 (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.)<\/p>\n<p id=\"b96b59af8ddb4ed696a55341bcc56368\"><a href=\"https:\/\/plot.ly\/~OLI_Stanford\/69.embed?link=false\" target=\"new\" rel=\"noopener\">Review the histogram for the data<\/a>.<\/p>\n<p id=\"deb7e9c7953741e5bb001741940551aa\">We will now summarize the main features of the distribution of ages as it appears from the histogram:<\/p>\n<p id=\"eb3d292bad7e4ab6804a8fc15969b207\"><em class=\"italic\">Shape:<\/em>\u00a0The distribution of ages is skewed right. We have a concentration of data among the younger ages and a long tail to the right. The vast majority of the \u201cbest actress\u201d awards are given to young actresses, with very few awards given to actresses who are older.<\/p>\n<p id=\"a47911ce2a94401e873e5157d0a50573\"><em class=\"italic\">Center:<\/em>\u00a0The data seem to be centered around 34 or 35 years old. Note that this implies that roughly half the awards are given to actresses who are less than 34 years old.<\/p>\n<p id=\"a324582bba1c41099eeee02c6887b3e5\"><em class=\"italic\">Spread:<\/em>\u00a0The data range from about 20 to about 80, so the approximate range equals 80 \u2013 20 = 60.<\/p>\n<p id=\"e5f1a12d0f4d4435a9b49c88d91652a0\"><em class=\"italic\">Outliers:<\/em>\u00a0There seem to be two probable outliers to the far right and possibly three around 62 years old.<\/p>\n<p id=\"d1d486e2d23b4acfb6073cc6e0d1d0ba\">You can see how informative it is to know \u201cwhat to look at\u201d in a histogram. If there is one conclusion that we can make here, it is that Hollywood likes to give Oscars to young actresses.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Did I get this?<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div id=\"h5p-18\">\n<div class=\"h5p-iframe-wrapper\"><iframe id=\"h5p-iframe-18\" class=\"h5p-iframe\" data-content-id=\"18\" style=\"height:1px\" src=\"about:blank\" frameBorder=\"0\" scrolling=\"no\" title=\"2.1 did I get this 4\"><\/iframe><\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<div class=\"example clearfix\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<ul id=\"a7f98e29719345f2982caaa290ab2012\">\n<li>\n<p id=\"af7c8cfe702134a49bfee440577edc30a\">The histogram is a graphical display of the distribution of a quantitative variable. It plots the number (count) of observations that fall in intervals of values.<\/p>\n<\/li>\n<li>\n<p id=\"af6ac88ffd8cf4f73900c52495f63ef37\">When examining the distribution of a quantitative variable, one should describe the overall pattern of the data (shape, center, spread), and any deviations from the pattern (outliers).<\/p>\n<\/li>\n<li>\n<p id=\"ae46a905f4bd041b78883a024fe4b55d8\">When describing the shape of a distribution, one should consider:<\/p>\n<ul id=\"b97405d248764e8894c3de3791801766\">\n<li>\n<p id=\"af89fa5c460c74d5aa6c32b5cfba2be0c\">Symmetry\/skewness of the distribution<\/p>\n<\/li>\n<li>\n<p id=\"aa3c8c08cc8344f9aac4bc5d171a7cece\">Peakedness (modality)\u2014the number of peaks (modes) the distribution has.<\/p>\n<\/li>\n<\/ul>\n<p id=\"afbc5377f40294356a089c5d14af48878\">*Not all distributions have a simple, recognizable shape.<\/p>\n<\/li>\n<li>\n<p id=\"ad874d214e0dc4ce3b6188c8a404c0112\">Outliers are data points that fall outside the overall pattern of the distribution and need further research before continuing the analysis.<\/p>\n<\/li>\n<li>\n<p id=\"ae116673fdc054b04bd6d39f4314d6493\">It is always important to interpret what the features of the distribution (as they appear in the histogram) mean in the context of the data.<\/p>\n<\/li>\n<\/ul>\n<h2><span style=\"color: #800080;\">Stemplot<\/span><\/h2>\n<div class=\"\">\n<p>The stem plot (also called stem and leaf plot) is another graphical display of the distribution of quantitative data.<\/p>\n<\/div>\n<div id=\"bf98171802b8430da609c0837be4117e\" class=\"section\">\n<div class=\"sectionContain\">\n<h3><span title=\"Quick scroll up\">Idea<\/span><\/h3>\n<p id=\"c8504d1cf9784becbe085351f41002ad\">Separate each data point into a stem and leaf, as follows:<\/p>\n<table id=\"db8dc46a215f465496936f28cd48d7e6_bx\" class=\"table labeled\">\n<tfoot>\n<tr>\n<td class=\"captionwrap\"><\/td>\n<\/tr>\n<\/tfoot>\n<tbody>\n<tr>\n<td>\n<table id=\"db8dc46a215f465496936f28cd48d7e6\" class=\"wbtable plain\">\n<tbody>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"d64fe3b91ab0407f8874859621192470\">The leaf is the right-most digit.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"d61356229ea74b95ada245d1161a4ed6\">The stem is everything except the right-most digit.<\/p>\n<\/td>\n<\/tr>\n<tr class=\"e\">\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"d324d4fa775344aa850300828b69bbcf\">So, if the data point is 34, then 3 is the stem and 4 is the leaf.<\/p>\n<\/td>\n<\/tr>\n<tr>\n<td colspan=\"1\" rowspan=\"1\" align=\"left\">\n<p id=\"ce682de7f8c24cff83a3a0039ef8fa26\">If the data point is 3.41, then 3.4 is the stem and 1 is the leaf.<\/p>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<\/div>\n<div id=\"fa3a41b70e5b497aac0e6d920cba243b\" class=\"examplewrap\">\n<div class=\"exHead\">\n<div class=\"textbox textbox--examples\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Example<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<div class=\"examplewrap\">\n<h4 class=\"exHead\">Best Actress Oscar Winners<\/h4>\n<div class=\"example clearfix\">\n<div>\n<p id=\"db189709163244ee8700eef59008b7cf\">We will continue with the Best Actress Oscar winners example (To see the full dataset, <a href=\"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-content\/uploads\/sites\/206\/2025\/01\/Dataset_-Best-Actress-Oscar-Winners-1970\u20132013.docx\">Dataset_ Best Actress Oscar Winners (1970\u20132013)<\/a>.)<\/p>\n<p id=\"bab2b1edf74b4fcc8e31e91c6dffee42\">34 34 27 37 42 41 36 32 41 33 31 74 33 49 38 61 21 41 26 80 42 29 33 36 45 49 39 34 26 25 33 35 35 28 30 29 61 32 33 45 29 62 22 44<\/p>\n<p id=\"f3147ead358b405d89e2a7c17c6a2df2\"><em>To make a stem plot:<\/em><\/p>\n<ol id=\"a4955f64c31c47159d296e741b997424\">\n<li>\n<p id=\"f7f643df98b04f4e8215f92e37d5827e\">Separate each observation into a stem and a leaf.<\/p>\n<\/li>\n<li>\n<p id=\"b9775d5e5798480cb443f81dfa473a34\">Write the stems in a vertical column with the smallest at the top, and draw a vertical line at the right of this column.<\/p>\n<\/li>\n<li>\n<p id=\"e0b245a1443f4f908d50631e57353a5d\">Go through the data points, and write each leaf in the row to the right of its stem.<\/p>\n<\/li>\n<li>\n<p id=\"b344d231b28b47519839e13511c55845\">Rearrange the leaves in an increasing order.<\/p>\n<\/li>\n<\/ol>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ea6701605ef5424581df709c38a8afe2\" class=\"img-responsive popimg aligncenter\" title=\"The result of steps 1, 2, and 3 on the given data set results in the following: first row: 2|7169658992 second row: 3|3376231383694355023 third row: 4|2119124954 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Step 4 results in: first row: 2|1256678999 second row: 3|0122333333445566789 third row: 4|1112244599 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Following the extra step(*): first row: 2|12 second row: 2|56678999 third row: 3|012233333344 fourth row: 3|5566789 fifth row: 4|1112244 sixth row: 4|599 seventh row: 5| eighth row: 5| ninth row: 6|112 tenth row: 7|4 eleventh row:7| twelfth row: 8|0\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot.jpg\" alt=\"The result of steps 1, 2, and 3 on the given data set results in the following: first row: 2|7169658992 second row: 3|3376231383694355023 third row: 4|2119124954 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Step 4 results in: first row: 2|1256678999 second row: 3|0122333333445566789 third row: 4|1112244599 fourth row: 5| fifth row: 6|112 sixth row: 7|4 seventh row: 8|0 Following the extra step(*): first row: 2|12 second row: 2|56678999 third row: 3|012233333344 fourth row: 3|5566789 fifth row: 4|1112244 sixth row: 4|599 seventh row: 5| eighth row: 5| ninth row: 6|112 tenth row: 7|4 eleventh row:7| twelfth row: 8|0\" \/><\/span><\/span><\/p>\n<p id=\"c1e0f443cb3b4c2388792dc75151a83b\">* When some of the stems hold a large number of leaves, we can split each stem into two: one holding the leaves 0-4, and the other holding the leaves 5-9. A statistical software package will often do the splitting for you, when appropriate.<\/p>\n<p id=\"ce6ad5a030954922b835b279f5acee40\"><em>Note\u00a0<\/em>that when rotated 90 degrees counterclockwise, the stem plot visually resembles a histogram:<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ee6ba663384840fca565fd33cd8b7923\" class=\"img-responsive popimg aligncenter\" title=\"A rotated stem plot. This is the same as the last stem plot given in the previous image, but rotated so that the stems are at the bottom, with the leaves on top.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_stemplot_rotated.jpg\" alt=\"A rotated stem plot. This is the same as the last stem plot given in the previous image, but rotated so that the stems are at the bottom, with the leaves on top.\" \/><\/span><\/span><\/p>\n<p id=\"e738b0fbab9a48c784177743928065f0\">This orientation makes the right-skewedness of the distribution clearly visible.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<p id=\"b92fdbb1e528448ba60cb2d27f06fc22\">The stem plot has additional unique features:<\/p>\n<ul id=\"e5e87270d7a44e12bf4fb095b8abc03f\">\n<li>\n<p id=\"a1f166ce8aa848009a0978de3fb76b20\">It preserves the original data.<\/p>\n<\/li>\n<li>\n<p id=\"b40943fe1a924824a1d705b470f99f0e\">It sorts the data (which will become very useful in the next section).<\/p>\n<\/li>\n<\/ul>\n<h2><span style=\"color: #800080;\">Comment<\/span><\/h2>\n<\/div>\n<\/div>\n<div id=\"df18416eb2b3459a89e1d17317b51e06\" class=\"section\">\n<div class=\"sectionContain\">\n<p id=\"dffd73019194444184d8364c7aa085ab\">There is another type of display that we can use to summarize a quantitative variable graphically\u2014the\u00a0<em>dot plot<\/em>. The dot plot, like the stem plot, shows each observation, but displays it with a dot rather than with its actual value. Here is the dot plot for the ages of Best Actress Oscar winners.<\/p>\n<p><span class=\"imagewrap\"><span class=\"image\"><img decoding=\"async\" id=\"ed46b761648e4c2e8c757cc94e61a38d\" class=\"img-responsive popimg aligncenter\" title=\"A dotplot titled &amp;quot;Dotplot of Age&amp;quot; A number line is at the bottom of the image, labeled in units of age from 24 to 80. At each age on the number line the a line of dots, each representing one winner of that age, appears above the place of that age on the number line.\" src=\"https:\/\/oli.cmu.edu\/repository\/webcontent\/72712ec00a0001dc418a87e73e8ebb77\/_u2_summarizing_data\/_m1_examining_distributions\/webcontent\/eda_examining_distributions_best_actress_dotplot.jpg\" alt=\"A dotplot titled &amp;quot;Dotplot of Age&amp;quot; A number line is at the bottom of the image, labeled in units of age from 24 to 80. At each age on the number line the a line of dots, each representing one winner of that age, appears above the place of that age on the number line.\" width=\"800\" \/><\/span><\/span><\/p>\n<\/div>\n<\/div>\n<div id=\"fd2306ffbdf24c98966c4c26367a5aaa\" class=\"section\">\n<div class=\"sectionContain\">\n<h2><span style=\"color: #800080;\" title=\"Quick scroll up\">Let\u2019s Summarize<\/span><\/h2>\n<p id=\"b45dd1a773af459d8c07c085b0b95860\">The stem plot is a simple but useful visual display of quantitative data. Its principal virtues are:<\/p>\n<ul id=\"f0a1f4f3902a47e5ad498551b9f77415\">\n<li>\n<p id=\"ed5c01e342e74ed69326cd93d93b0ed3\">Easy and quick to construct for small, simple datasets.<\/p>\n<\/li>\n<li>\n<p id=\"fbd166c58daa442b815973be5c9c2cd7\">Retains the actual data.<\/p>\n<\/li>\n<li>\n<p id=\"c17f077ad7e34bd99b05c7acc1913fdd\">Sorts (ranks) the data.<\/p>\n<\/li>\n<\/ul>\n<div class=\"textbox textbox--exercises\">\n<header class=\"textbox__header\">\n<h3 class=\"textbox__title\">Many students wonder&#8230;<\/h3>\n<\/header>\n<div class=\"textbox__content\">\n<p><strong>Question:<\/strong> How do we know which graph to use: the histogram, stemplot, or dotplot?<\/p>\n<p><strong>Answer:<\/strong> Since for the most part we are not going to deal with very small data sets in this course, we will generally display the distribution of a quantitative variable using a histogram generated by a statistical software package.<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":150,"menu_order":8,"template":"","meta":{"pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[48],"contributor":[],"license":[],"class_list":["post-450","chapter","type-chapter","status-publish","hentry","chapter-type-numberless"],"part":413,"_links":{"self":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/450","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/users\/150"}],"version-history":[{"count":16,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/450\/revisions"}],"predecessor-version":[{"id":970,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/450\/revisions\/970"}],"part":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/parts\/413"}],"metadata":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapters\/450\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/media?parent=450"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/pressbooks\/v2\/chapter-type?post=450"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/contributor?post=450"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/pressbooks.ccconline.org\/mat1260\/wp-json\/wp\/v2\/license?post=450"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}