6.1: Properties of the Normal Distribution

Introduction to Normal Random Variables

In the Exploratory Data Analysis unit of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends.

The symmetric unimodal bell shape.

Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it becomes increasingly unlikely to take values far from the mean in either direction. The particular shape exhibited by these variables has been studied since the early part of the nineteenth century, when they were first called “normal” as a way of suggesting their depiction of a common, natural pattern.

Observations of Normal Distributions

There are many normal distributions. Even though all of them have the bell-shape, they vary in their center and spread.

Three normal normal distribution curves which vary in height and mean.

More specifically, the center of the distribution is determined by its mean (μ) and the spread is determined by its standard deviation ( σ).

Some observations we can make as we look at this graph are:

  • The black and the red normal curves have means or centers at μ = 10. However, the red curve is more spread out and thus has a larger standard deviation.

    As you look at these two normal curves, notice that as the red graph is squished down, the spread gets larger, thus allowing the area under the curve to remain the same.

  • The black and the green normal curves have the same standard deviation or spread (the range of the black curve is 6.5-13.5, and the green curve’s range is 10.5-17.5).

Even more important than the fact that many variables themselves follow the normal curve is the role played by the normal curve in sampling theory, as we’ll see in the next module of probability. Understanding the normal distribution is an important step in the direction of our overall goal, which is to relate sample means or proportions to population means or proportions. The goal of this section is to better understand normal random variables and their distributions.

The Standard Deviation Rule for Normal Random Variables

We began to get a feel for normal distributions in the Exploratory Data Analysis (EDA) section, when we introduced the Standard Deviation Rule (or the 68-95-99.7 rule) for how values in a normally-shaped sample data set behave relative to their mean (¯x) and standard deviation (s). This is the same rule that dictates how the distribution of a normal random variable behaves relative to its mean μ and standard deviation σ. Now we use probability language and notation to describe the random variable’s behavior. For example, in the EDA section, we would have said “68% of pregnancies in our data set fall within 1 standard deviation (s) of their mean (¯x).” The analogous statement now would be “If X, the length of a randomly chosen pregnancy, is normal with mean (μ) and standard deviation (σ), then 0.68=P(μσ<X<μ+σ).”

In general, if X is a normal random variable, then the probability is

68% that X falls within 1 σ of μ , that is, in the interval μ±σ

95% that X falls within 2 σ of μ , that is, in the interval μ±2σ

99.7% that X falls within 3 σ of μ , that is, in the intervalμ±3σ

Using probability notation, we may write

0.68=P(μσ<X<μ+σ)

0.95=P(μ2σ<X<μ+2σ)

0.997=P(μ3σ<X<μ+3σ)

A normal bell curve with some ranges marked. At the center (peak) of the bell curve is μ. The area under the bell curve between μ-σ < X < μ+σ compromises of 0.68 of the total area under the bell curve (which is 1). The area under the bell curve between μ-2σ < X < μ+2σ is 0.95 of the total area under the curve. Capturing even more area under the bell curve is the range of μ-3σ < X < μ+3σ, which is 0.997 of the total area under the curve.

Comment

Notice that the information from the rule can be interpreted from the perspective of the tails of the normal curve: since .68 is the probability of being within 1 standard deviation of the mean, (1 – .68) / 2 = .16 is the probability of being further than 1 standard deviation below the mean (or further than 1 standard deviation above the mean). Likewise, (1 – .95) / 2 = .025 is the probability of being more than 2 standard deviations below (or above) the mean; (1 – .997) / 2 = .0015 is the probability of being more than 3 standard deviations below (or above) the mean. The three figures below illustrate this.

A normal bell curve, in which μ, μ-σ, and μ+σ have been marked on the horizontal axis. The probability of being within one standard deviation of μ, or between μ-σ and μ+σ, is .68 . The probability of being below one standard deviation from the mean is the area under the bell curve to the left of μ-σ which is .16 . Likewise, the probability of being further than one standard deviation above the mean is the area under the bell curve to the right of μ+σ, which is .16 .A normal bell curve. The probability of being within two standard deviation of μ is .95 . The probability of being further than two standard deviations below the mean is the area under the bell curve to the left of μ-2σ which is .025 . The probability of being further than two standard deviation above the mean is the area under the bell curve to the right of μ+2σ, which is .025 .A normal bell curve. The probability of being within three standard deviation of μ is .997 . The probability of being below three standard deviations from the mean is the area under the bell curve to the left of μ-3σ which is .0015 . The probability of being further than three standard deviation above the mean is the area under the bell curve to the right of μ+3σ, which is .0015 .

Example

Suppose that foot length of a randomly chosen adult male is a normal random variable with mean μ=11 and standard deviation σ=1.5. Then the Standard Deviation Rule lets us sketch the probability distribution of X as follows:

A probability distribution curve in which the horizontal axis is labeled "X - Foot Length." The curve is a normal bell curve. The mean, μ, is at X=11. The first standard deviation is at X=9.5 and X=12.5, with probability of .68 . The second standard deviation is X=8 and X=14, with probability of .95 . The third standard deviation is at X=6.5 and X=15.5, with probability of .997 .

(a) What is the probability that a randomly chosen adult male will have a foot length between 8 and 14 inches? .95, or 95%.

(b) An adult male is almost guaranteed (.997 probability) to have a foot length between what two values? 6.5 and 15.5 inches.

(c) The probability is only 2.5% that an adult male will have a foot length greater than how many inches? 14.

The probability distribution curve for foot lengths. The 2nd standard deviation boundaries from the mean have been marked at X=8 and X=14 . The probability of being within 2 standard deviations of the mean is .95, and the probability of being above 2 standard deviations (X > 14) is .025 . The probability of being below the 2nd standard deviation (X < 8) is also .025 .

Now you should try a few. (Use the figure that is just before part (a) to help you.)

Learn by Doing

Comment

Notice that there are two types of problems we may want to solve: those like (a)(d) and (e), in which a particular interval of values of a normal random variable is given, and we are asked to find a probability, and those like (b)(c) and (f), in which a probability is given and we are asked to identify what the normal random variable’s values would be.

Did I get this?

Length (in days) of human pregnancies is a normal random variable (X) with mean 266, standard deviation 16.

(It would be useful to sketch this normal distribution yourself, marking its mean and the values that are 1, 2, and 3 standard deviations below and above the mean. Click here to compare your figure to ours.)

Let’s go back to our example of foot length:

How likely or unlikely is it for a male’s foot length to be more than 13 inches?

The probability distribution curve for foot lengths. It takes the shape of a normal bell curve. The boundaries for the first, second, and third standard deviations have been marked, and we see that no line falls on X=13 . We need the area under the bell curve to the right of X=13 .

Since 13 inches doesn’t happen to be exactly 1, 2, or 3 standard deviations away from the mean, we would only be able to give a very rough estimate of the probability at this point. Clearly, the Standard Deviation Rule only describes the tip of the iceberg, and while it serves well as an introduction to the normal curve, and gives us a good sense of what would be considered likely and unlikely values, it is very limited in the probability questions it can help us answer.

Here is another familiar normal distribution:

A normal bell curve representing the probability distribution curve for the scores on the math portion of the SAT. The horizontal axis is labeled "X - SAT scores." μ = 500, and σ = 100 . We want to know the probability that a student scores 633 or higher. This is the area under the bell curve to the right of X=633 . Note that 633 is not covered by the Standard Deviation Rule.

Suppose we are interested in knowing the probability that a randomly selected student will score 633 or more on the math portion of his or her SAT (this is represented by the red area). Again, 633 does not fall exactly 1, 2, or 3 standard deviations above the mean. Notice, however, that an SAT score of 633 and a foot length of 13 are both about 1/3 of the way between 1 and 2 standard deviations. As you continue to read this page, you’ll realize that this positioning relative to the mean is the key to finding probabilities.

Share This Book