6.1: Properties of the Normal Distribution
Introduction to Normal Random Variables
In the Exploratory Data Analysis unit of this course, we encountered data sets, such as lengths of human pregnancies, whose distributions naturally followed a symmetric unimodal bell shape, bulging in the middle and tapering off at the ends.
Many variables, such as pregnancy lengths, shoe sizes, foot lengths, and other human physical characteristics exhibit these properties: symmetry indicates that the variable is just as likely to take a value a certain distance below its mean as it is to take a value that same distance above its mean; the bell-shape indicates that values closer to the mean are more likely, and it becomes increasingly unlikely to take values far from the mean in either direction. The particular shape exhibited by these variables has been studied since the early part of the nineteenth century, when they were first called “normal” as a way of suggesting their depiction of a common, natural pattern.
Observations of Normal Distributions
There are many normal distributions. Even though all of them have the bell-shape, they vary in their center and spread.
More specifically, the center of the distribution is determined by its mean (μ) and the spread is determined by its standard deviation ( σ).
Some observations we can make as we look at this graph are:
-
The black and the red normal curves have means or centers at μ = 10. However, the red curve is more spread out and thus has a larger standard deviation.
As you look at these two normal curves, notice that as the red graph is squished down, the spread gets larger, thus allowing the area under the curve to remain the same.
-
The black and the green normal curves have the same standard deviation or spread (the range of the black curve is 6.5-13.5, and the green curve’s range is 10.5-17.5).
Even more important than the fact that many variables themselves follow the normal curve is the role played by the normal curve in sampling theory, as we’ll see in the next module of probability. Understanding the normal distribution is an important step in the direction of our overall goal, which is to relate sample means or proportions to population means or proportions. The goal of this section is to better understand normal random variables and their distributions.
The Standard Deviation Rule for Normal Random Variables
We began to get a feel for normal distributions in the Exploratory Data Analysis (EDA) section, when we introduced the Standard Deviation Rule (or the 68-95-99.7 rule) for how values in a normally-shaped sample data set behave relative to their mean (¯x) and standard deviation (s). This is the same rule that dictates how the distribution of a normal random variable behaves relative to its mean μ and standard deviation σ. Now we use probability language and notation to describe the random variable’s behavior. For example, in the EDA section, we would have said “68% of pregnancies in our data set fall within 1 standard deviation (s) of their mean (¯x).” The analogous statement now would be “If X, the length of a randomly chosen pregnancy, is normal with mean (μ) and standard deviation (σ), then 0.68=P(μ−σ<X<μ+σ).”
In general, if X is a normal random variable, then the probability is
68% that X falls within 1 σ of μ , that is, in the interval μ±σ
95% that X falls within 2 σ of μ , that is, in the interval μ±2σ
99.7% that X falls within 3 σ of μ , that is, in the intervalμ±3σ
Using probability notation, we may write
0.68=P(μ−σ<X<μ+σ)
0.95=P(μ−2σ<X<μ+2σ)
0.997=P(μ−3σ<X<μ+3σ)
Comment
Notice that the information from the rule can be interpreted from the perspective of the tails of the normal curve: since .68 is the probability of being within 1 standard deviation of the mean, (1 – .68) / 2 = .16 is the probability of being further than 1 standard deviation below the mean (or further than 1 standard deviation above the mean). Likewise, (1 – .95) / 2 = .025 is the probability of being more than 2 standard deviations below (or above) the mean; (1 – .997) / 2 = .0015 is the probability of being more than 3 standard deviations below (or above) the mean. The three figures below illustrate this.
Example
Suppose that foot length of a randomly chosen adult male is a normal random variable with mean μ=11 and standard deviation σ=1.5. Then the Standard Deviation Rule lets us sketch the probability distribution of X as follows:
(a) What is the probability that a randomly chosen adult male will have a foot length between 8 and 14 inches? .95, or 95%.
(b) An adult male is almost guaranteed (.997 probability) to have a foot length between what two values? 6.5 and 15.5 inches.
(c) The probability is only 2.5% that an adult male will have a foot length greater than how many inches? 14.
Now you should try a few. (Use the figure that is just before part (a) to help you.)
Exercise
(d) How likely or unlikely is a male’s foot length to be smaller than 9.5 inches? Not too unlikely, since the probability of being smaller than 9.5 is
—
0.025
0.05
0.16
0.32
0.68
, which is not a particularly low probability.
(e) How likely or unlikely is a foot length longer than 15.5 inches? Extremely unlikely, since the probability of being longer than 15.5 is only
—
0.0015
0.003
0.015
0.025
0.05
.
(f) There is probability of 0.5 that a male’s foot is shorter than
—
9.5
11
12.5
.
Comment
Notice that there are two types of problems we may want to solve: those like (a), (d) and (e), in which a particular interval of values of a normal random variable is given, and we are asked to find a probability, and those like (b), (c) and (f), in which a probability is given and we are asked to identify what the normal random variable’s values would be.
Exercise
Let’s go back to our example of foot length:
How likely or unlikely is it for a male’s foot length to be more than 13 inches?
Since 13 inches doesn’t happen to be exactly 1, 2, or 3 standard deviations away from the mean, we would only be able to give a very rough estimate of the probability at this point. Clearly, the Standard Deviation Rule only describes the tip of the iceberg, and while it serves well as an introduction to the normal curve, and gives us a good sense of what would be considered likely and unlikely values, it is very limited in the probability questions it can help us answer.
Here is another familiar normal distribution:
Suppose we are interested in knowing the probability that a randomly selected student will score 633 or more on the math portion of his or her SAT (this is represented by the red area). Again, 633 does not fall exactly 1, 2, or 3 standard deviations above the mean. Notice, however, that an SAT score of 633 and a foot length of 13 are both about 1/3 of the way between 1 and 2 standard deviations. As you continue to read this page, you’ll realize that this positioning relative to the mean is the key to finding probabilities.