2.4: Standard Deviation

Introduction

So far, we have introduced two measures of spread; the range (covered by all the data) and the inter-quartile range (IQR), which looks at the range covered by the middle 50% of the distribution. We also noted that the IQR should be paired as a measure of spread with the median as a measure of center. We now move on to another measure of spread, the standard deviation, which quantifies the spread of a distribution in a completely different way.

Idea

The idea behind the standard deviation is to quantify the spread of a distribution by measuring how far the observations are from their mean, [latex]\bar{x}[/latex]. The standard deviation gives the average (or typical distance) between a data point and the mean, [latex]\bar{x}[/latex].

Notation

There are many notations for the standard deviation: SD, s, Sd, StDev. Here, we’ll use SD as an abbreviation for standard deviation, and use s as the symbol.

Calculation

In order to get a better understanding of the standard deviation, it would be useful to see an example of how it is calculated. In practice, we will use a computer to do the calculation. 

Example

Video Store Customers

The following are the number of customers who entered a video store in 8 consecutive hours: 7, 9, 5, 13, 3, 11, 15, 9

To find the standard deviation of the number of hourly customers:

  1. Find the mean, [latex]\bar{x}[/latex] of your data: [latex]\frac{(7+9+5...+9)}{8}=9[/latex]
  2. Find the deviations from the mean: the difference between each observation and the mean

    (7 – 9), (9 – 9), (5 – 9), (13 – 9), (3 – 9), (11 – 9), (15 – 9), (9 – 9)

    -2, 0, -4, 4, -6, 2, 6, 0

    Since the standard deviation is the average (typical) distance between the data points and their mean, it would make sense to average the deviations we got. Note, however, that the sum of the deviations from the mean,

    [latex]\bar{x}[/latex]

    is 0 (add them up and see for yourself). This is always the case, and is the reason why we have to do a more complicated calculation to determine the standard deviation:

  3. Square each of the deviations:

    The first few are

    (-2)2 = 4, (0)2 = 0, (-4)2 = 16, and the rest are 16, 36, 4, 36, 0.

  4. Average the square deviations by adding them up, and dividing by n – 1, (one less than the sample size):

    [latex]\frac{(4 + 0 + 16 + 16 + 36 + 4 + 36 + 0)}{8-1}=\frac{112}{7}=16[/latex]

    • the reason why we “sort of” average the square deviations (divide by n – 1) rather than take the actual average (divide by n) is beyond the scope of the course at this point, but will be addressed later.

    • This average of the squared deviations is called the variance of the data.

  5. The SD of the data is the square root of the variance: SD=16=4

    • Why do we take the square root? Note that 16 is an average of the squared deviations, and therefore has different units of measurement. In this case 16 is measured in “squared customers,” which obviously cannot be interpreted. We therefore take the square root in order to compensate for the fact that we squared our deviations, and in order to go back to the original unit of measurement.

Recall that the average number of customers who enter the store in an hour is 9. The interpretation of SD = 4 is that on average, the actual number of customers that enter the store each hour is 4 away from 9.

Comment:

The importance of the numerical figure that we found in #4 above called the variance (=16 in our example) will be discussed much later in the course when we get to the inference part.

Exercises

Properties of the Standard Deviation

  1. It should be clear from the discussion thus far that the SD should be paired as a measure of spread with the mean as a measure of center.

  2. Note that the only way, mathematically, in which the SD = 0, is when all the observations have the same value (Ex: 5, 5, 5, … , 5), in which case, the deviations from the mean (which is also 5) are all 0. This is intuitive, since if all the data points have the same value, we have no variability (spread) in the data, and expect the measure of spread (like the SD) to be 0. Indeed, in this case, not only is the SD equal to 0, but the range and the IQR are also equal to 0. Do you understand why?

  3. Like the mean, the SD is strongly influenced by outliers in the data. Consider the example concerning video store customers: 3, 5, 7, 9, 9, 11, 13, 15 (data ordered). If the largest observation was wrongly recorded as 150, then the average would jump up to ¯x=25.9, and the standard deviation would jump up to SD = 50.3. Note that in this simple example, it is easy to see that while the standard deviation is strongly influenced by outliers, the IQR is not. The IQR would be the same in both cases, since, like the median, the calculation of the quartiles depends only on the order of the data rather than the actual values.

The last comment leads to the following very important conclusion:

Choosing Numerical Summaries

Use ¯x (the mean) and the standard deviation as measures of center and spread only for reasonably symmetric distributions with no outliers.

Use the five-number summary (which gives the median, IQR and range) for all other cases.

The Standard Deviation Rule

In the previous activity we tried to help you develop better intuition about the concept of standard deviation. The rule that we are about to present, called “The Standard Deviation Rule” (also known as “The Empirical Rule”) will hopefully also contribute to building your intuition about this concept.

Consider a symmetric mound-shaped distribution:A symmetric, mound shaped histogramFor distributions having this shape (also known as the normal shape), the following rule applies:

The Standard Deviation Rule:

  • Approximately 68% of the observations fall within 1 standard deviation of the mean.
  • Approximately 95% of the observations fall within 2 standard deviations of the mean.
  • Approximately 99.7% (or virtually all) of the observations fall within 3 standard deviations of the mean.

The following picture illustrates this rule:A symmetric, mound shaped histogram. The mean is located at the mode of the histogram (right in the middle. The middle 68% of observations fall within 1 standard deviation of the mean. This means that the bars on this histogram representing the 68% of the observations closest to the mean are have a value that is at most 1 standard deviation from the mean. 95% of the observations fall within 2 standard deviations of the mean. This encompasses more bars which are further from the mean (center of the histogram) than the center 68% did. Lastly, 99.7% of the observations fall within 3 standard deviations of the mean. Even more bars are selected.

This rule provides another way to interpret the standard deviation of a distribution, and thus also provides a bit more intuition about it.

To see how this rule works in practice, consider the following example:

Example

Male Height

The following histogram represents height (in inches) of 50 males. Note that the data are roughly normal, so we would like to see how the Standard Deviation Rule works for this example.A symmetric histogram. The vertical axis is labeled "Frequency" and ranges from 0 to 7. The horizontal axis is labeled "Height" and ranges from 64 to 72. The mode of the histogram is at around x=71, y=7.

Below are the actual data, and the numerical summaries of the distribution. Note that the key players here, the mean and standard deviation, have been highlighted.Actual Data. "(" Denotes the start of the 2nd standard deviation, and ")" denotes the end. "[" denotes the start of the 1st standard deviation, and "]" denotes the end. Data: 64 (66 66 67 67 67 67 [68 68 68 68 68 68 69 69 69 69 69 70 70 70 70 70 70 70 71 71 71 71 71 71 71 72 72 72 72 72 72 73 73 73] 74 74 74 74 74 75 76 76) 77

Statistic Height
N 50
Mean 70.58
StDev 2.858
min 64
Q1 68
Median 70.5
Q3 72
Max 77

To see how well the Standard Deviation Rule works for this case, we will find what percentage of the observations falls within 1, 2, and 3 standard deviations from the mean, and compare it to what the Standard Deviation Rule tells us this percentage should be.mean-SD=67.7, and mean+SD=73.4, so this 1st deviation captures 34 out of 50 observations = 68%. The SD rule says 68% also. mean-2(SD) = 64.9 and mean+2(SD)=76.3, which encompasses 48 out of 50 observations = 96%. The SD rule says 95%. Mean-3(SD)=62, and mean+3(SD)=79.2, which captures all of the observations = 100%. The SD rule says 99.7%

It turns out the Standard Deviation Rule works very well in this example.

The following example illustrates how we can apply the Standard Deviation Rule to variables whose distribution is known to be approximately normal.

Example

Length of Human Pregnancy

The length of the human pregnancy is not fixed. It is known that it varies according to a distribution which is roughly normal, with a mean of 266 days, and a standard deviation of 16 days. (Source: Figures are from Moore and McCabe, Introduction to the Practice of Statistics).

First, lets’ apply the Standard Deviation Rule to this case by drawing a picture:A histogram. The X-axis is labeled "Length (days)", and it ranges from about 214 to 314 days. The mode and mean of the histogram is at x=266. The 1st Standard Deviation, or the middle 68%, spans the range [250,282]. The 2nd Standard Deviation (middle 95%) spans the range [234,298]. The 3rd Standard Deviation (middle 99.7%) spans the range[218,314].

We can now use the information provided by the Standard Deviation Rule about the distribution of the length of human pregnancy, to answer some questions. For example:

Question:

How long do the middle 95% of human pregnancies last?

Answer:

The middle 95% of pregnancies last within 2 standard deviations of the mean, or in this case 234-298 days.

Question:

What percent of pregnancies last more than 298 days?

Answer:

To answer this consider the following picture:

The area outside of the middle 95% is shaded red. There are two red areas on either side of the middle 95%. Together, they make up the remaining 5%, and individually, they are 2.5% each because the normal distribution is symmetric.

Since 95% of the pregnancies last between 234 and 298 days, the remaining 5% of pregnancies last either less than 234 days or more than 298 days. Since the normal distribution is symmetric, these 5% of pregnancies are divided evenly between the two tails, and therefore 2.5% of pregnancies last more than 298 days.

Question:

How short are the shortest 2.5% of pregnancies?

Answer:

Using the same reasoning as in the previous question, the shortest 2.5% of human pregnancies last less than 234 days.

Question:

What percentage of human pregnancies last more than 266 days?

Answer:

Since 266 days is the mean, approximately 50% of pregnancies last more than 266 days.

In general, the larger the animal, the longer the length of pregnancy (also called gestation period). For the horse, for example, the gestation period varies roughly according to a normal distribution with a mean of 336 days and a standard deviation of 3 days (Source: These figures are from Moore and McCabe, Introduction to the Practice of Statistics).

Use the Standard Deviation Rule to answer the following questions. This picture of the SD rule applied to this distribution will help:

 

The graph shows a normal distribution of 327 to 345 days gestation for the horse. The mean is 336 days, and the standard deviation is 3 days. The area of standard deviation, from 333 to 339, is shaded red, accounting for 68% of pregnancies. Two unshaded areas, to the left and right of the center 68%, account for 16% each for a total of 32% outside the standard deviation.

Did I get this?

Did You Get It? If so, then go ahead and move on. If not, then click the link below for some additional practice.

Let’s Summarize

  • The standard deviation measures the spread by reporting a typical (average) distance between the data points and their average.

  • It is appropriate to use the SD as a measure of spread with the mean as the measure of center.

  • Since the mean and standard deviations are highly influenced by extreme observations, they should be used as numerical descriptions of the center and spread only for distributions that are roughly symmetric, and have no outliers.

  • For symmetric mound-shaped distributions, the Standard Deviation Rule tells us what percentage of the observations falls within 1, 2, and 3 standard deviations of the mean, and thus provides another way to interpret the standard deviation’s value for distributions of this type.

Share This Book