3.1: Explanatory and Response Variables
While it is fundamentally important to know how to describe the distribution of a single variable, most studies pose research questions that involve exploring the relationship between two variables using the collected data.
Here are a few examples of such research questions with the two variables highlighted:
Examples
-
Is there a relationship between gender and test scores on a particular standardized test?
Other ways of phrasing the same research question include the following:
- Is performance on the test related to gender?
- Is there a gender effect on test scores?
- Are there differences in test scores between males and females?
- How is the number of calories in a hot dog related to (or affected by) the type of hot dog (beef, meat or poultry)? In other words, are there differences in the number of calories among the three types of hot dogs?
- Is there a relationship between the type of light a baby sleeps with (no light, night-light, lamp) and whether or not the child develops nearsightedness?
- Are the smoking habits of a person (yes, no) related to the person’s gender?
- How well can we predict a student’s freshman year GPA from his/her SAT score?
- What is the relationship between driver’s age and sign legibility distance (the maximum distance at which the driver can read a sign)?
- Is there a relationship between the time a person has practiced driving while having a learner’s permit and whether or not this person passed the driving test?
- Can you predict a person’s favorite type of music (classical, rock, jazz) on the basis of his/her IQ level?
In most studies involving two variables, each of the variables has a role. We distinguish between:
- The explanatory variable (also commonly referred to as the independent variable)—the variable that claims to explain, predict, or affect the response; and
- The response variable (also commonly referred to as the dependent variable)—the outcome of the study.
Typically, the explanatory (or independent) variable is denoted by X, while the response (or dependent) variable is denoted by Y.
Explanatory and Response Variables
In this course, we use the terms explanatory and response variables instead of independent and dependent variables.
Now, let’s go back to some of the examples and classify the two relevant variables according to their roles in the study.
Example 1
We want to explore whether the outcome of the study—the score on a test—is affected by the test-taker’s gender.
Therefore: |
Gender is the explanatory variable |
Test score is the response variable |
Example 2
Therefore: |
Light type is the explanatory variable |
Nearsightedness is the response variable |
Example 3
Here we are examining whether a student’s SAT score is a good predictor for the student’s GPA freshman year.
Therefore: |
SAT score is the explanatory variable |
GPA of freshman year is the response variable |
Example 4
Here we are examining whether a person’s outcome on the driving test (pass/fail) can be explained by the length of time this person has practiced driving prior to the test.
Therefore: |
Time is the explanatory variable |
Driving test outcome is the response variable |
Many Students Wonder…
If we further classify each of the two relevant variables according to type (categorical or quantitative), we get the following four possibilities for role-type classification:
- Categorical explanatory and quantitative response
- Categorical explanatory and categorical response
- Quantitative explanatory and quantitative response
- Quantitative explanatory and categorical response
This role-type classification can be summarized and easily visualized in the following table (note that the explanatory variable is always listed first):
This role-type classification serves as the infrastructure for this entire section. In each of the four cases, different statistical tools (displays and numerical measures) should be used to explore the relationship between the two variables. This suggests the following important principle:
Principle |
---|
When confronted with a research question that involves exploring the relationship between two variables, the first and most crucial step is to determine which of the four cases represents the data structure of the problem. In other words, the first step should be classifying the two relevant variables according to their role and type, and only then can we determine what statistical tools should be used to analyze them. |
Now let’s go back to our examples and determine which of the four cases represents the data structure of each:
Example 1
Gender is the explanatory variable, and it is categorical. |
Test score is the response variable, and it is quantitative. |
Therefore, this is an example of case C→Q. |
Example 2
Light Type is the explanatory variable, and it is categorical. |
Nearsightedness is the response variable, and it is categorical. |
Therefore, this is an example of case C→C. |
Example 3
SAT Score is the explanatory variable, and it is quantitative. |
GPA of Freshman Year is the response variable, and it is quantitative. |
Therefore, this is an example of case Q→Q. |
Example 4
Time is the explanatory variable, and it is quantitative. |
Driving Test Outcome is the response variable, and it is categorical. |
Therefore, this is an example of case Q→C. |
Did I get this?
In the following three problem, you are presented with a brief description of a study involving two variables. Based on the role-type classification of the two variables, you are asked to determine which of the four cases represents the data structure of the problem.
For your convenience, here again is the role-type classification table: