1.5: Experimental Design

Now that we have learned about the first stage of data production— sampling—we can move on to the next stage—designing studies.

Introduction

Obviously, sampling is not done for its own sake. After this first stage in the data production process is completed, we come to the second stage, that of gaining information about the variables of interest from the sampled individuals. In this module we’ll discuss three study designs; each design enables you to determine the values of the variables in a different way. You can:

– Carry out an observational study, in which values of the variable or variables of interest are recorded as they naturally occur. There is no interference by the researchers who conduct the study.

– Take a sample survey, which is a particular type of observational study in which individuals report variables’ values themselves, frequently by giving their opinions.

– Perform an experiment. Instead of assessing the values of the variables as they naturally occur, the researchers interfere, and they are the ones who assign the values of the explanatory variable to the individuals. The researchers “take control” of the values of the explanatory variable because they want to see how changes in the value of the explanatory variable affect the response variable. (Note: By nature, any experiment involves at least two variables.)

The type of design used, and the details of the design, are crucial, since they will determine what kind of conclusions we may draw from the results. In particular, when studying relationships in the Exploratory Data Analysis unit, we stressed that an association between two variables does not guarantee that a causal relationship exists. In this module, we will explore how the details of a study design play a crucial role in determining our ability to establish evidence of causation.

Here is how this chapter is organized:

We’ll start this chapter by learning how to identify study types. In particular, we will highlight the distinction between observational studies and experiments.

We will then discuss each of the three study designs mentioned above.

  • We’ll discuss observational studies, focusing on why it is difficult to establish causation in these type of studies, as well as other possible flaws.
  • We’ll then focus on experiments, learning, among other things, that when appropriately designed, experiments can provide evidence of causation.
  • We’ll end the module by discussing surveys and sample size.

Learning Objectives

  • Identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

Identifying Study Design

Because each type of study design has its own advantages and trouble spots, it is important to begin by determining what type of study we are dealing with. The following example helps to illustrate how we can distinguish among the three basic types of design mentioned in the introduction—observational studies, sample surveys, and experiments.

Example

Suppose researchers want to determine whether people tend to snack more while they watch television. In other words, the researchers would like to explore the relationship between the explanatory variable “TV” (a categorical variable that takes the values “on’” and “not on”) and the response variable “snack consumption.”

Identify each of the following designs as being an observational study, a sample survey, or an experiment.

1. Recruit participants for a study. While they are presumably waiting to be interviewed, half of the individuals sit in a waiting room with snacks available and a TV on. The other half sit in a waiting room with snacks available and no TV, just magazines. Researchers determine whether people consume more snacks in the TV setting.

This is an experiment, because the researchers take control of the explanatory variable of interest (TV on or not) by assigning each individual to either watch TV or not, and determine the effect that has on the response variable of interest (snack consumption).

2. Recruit participants for a study. Give them journals to record hour by hour their activities the following day, including when they watch TV and when they consume snacks. Determine if snack consumption is higher during TV times.

This is an observational study, because the participants themselves determine whether or not to watch TV. There is no attempt on the researchers’ part to interfere.

3. Recruit participants for a study. Ask them to recall, for each hour of the previous day, whether they were watching TV, and what snacks they consumed each hour. Determine whether snack consumption was higher during the TV times.

This is also an observational study; again, it was the participants themselves who decided whether or not to watch TV. Do you see the difference between 2 and 3? See the comment below.

4. Poll a sample of individuals with the following question: While watching TV, do you tend to snack: (a) less than usual; (b) more than usual; or (c) the same amount as usual?

This is a sample survey, because the individuals self-assess the relationship between TV watching and snacking.

Comment

Notice that in Example 2, the values of the variables of interest (TV watching and snack consumption) are recorded forward in time. Such observational studies are called prospective. In contrast, in Example 3, the values of the variables of interest are recorded backward in time. This is called a retrospective observational study. We’ll discuss this distinction later in this module.

Did I get this?

Identify the type of study design in the following scenario:

While some studies are designed to gather information about a single variable, many studies attempt to draw conclusions about the relationship between two variables. In particular, researchers often would like to produce evidence that one variable actually causes changes in the other. For example, the research question addressed in the previous example sought to establish evidence that watching TV could cause an increase in snacking. Such studies may be especially useful and interesting, but they are also especially vulnerable to flaws that could invalidate the conclusion of causation. In several of the examples in this module we will see that although evidence of an association between two variables may be quite clear, the question of whether one variable is actually causing changes in the other may be too murky to be entirely resolved. In general, with a well-designed experiment we have a better chance of establishing causation than with an observational study. However, experiments are also subject to certain pitfalls, and there are many situations in which an experiment is not an option. A well-designed observational study may still provide fairly convincing evidence of causation under the right circumstances.

Experiments vs. Observational Studies

Before assessing the effectiveness of observational studies and experiments for producing evidence of a causal relationship between two variables, we will illustrate the essential differences between these two designs.

Example

Every day, a huge number of people are engaged in a struggle whose outcome could literally affect the length and quality of their life: they are trying to quit smoking. Just the array of techniques, products, and promises available shows that quitting is not easy, nor is its success guaranteed. Researchers would like to determine which of the following is the best method:

1. Drugs that alleviate nicotine addiction.

2. Therapy that trains smokers to quit.

3. A combination of drugs and therapy.

4. Neither form of intervention (quitting “cold turkey”).

The explanatory variable is the method (1, 2, 3 or 4) , while the response variable is eventual success or failure in quitting. In an observational study, values of the explanatory variable occur naturally. In this case, this means that the participants themselves choose a method of trying to quit smoking. In an experiment, researchers assign the values of the explanatory variable. In other words, they tell people what method to use. Let us consider how we might compare the four techniques, via either an observational study or an experiment.

1. An observational study of the relationship between these two variables requires us to collect a representative sample from the population of smokers who are beginning to try to quit. We can imagine that a substantial proportion of that population is trying one of the four above methods. In order to obtain a representative sample, we might use a nationwide telephone survey to identify 1,000 smokers who are just beginning to quit smoking. We record which of the four methods the smokers use. One year later, we contact the same 1,000 individuals and determine whether they succeeded.

2. In an experiment, we again collect a representative sample from the population of smokers who are just now trying to quit, using a nationwide telephone survey of 1,000 individuals. This time, however, we divide the sample into 4 groups of 250 and assign each group to use one of the four methods to quit. One year later, we contact the same 1,000 individuals and determine whose attempts succeeded while using our designated method.

The following figures illustrate the two study designs:

1. Observational study:

A visual representation of the Observational Study. A large circle represents the entire population. Through random selection we generate the sample, which is represented as a smaller circle. The circle representing the samples is divided up unevenly into 4 pieces, each piece representing one value of the explanatory variable (method), which have been "Self- Assigned" by the people in the sample.

2. Experiment:

A visual representation of the Experimental Study. A large circle represents the entire population. Through random selection we generate the sample, which is represented as a smaller circle. The circle representing the samples is divided up evenly into 4 pieces, each piece representing one value of the explanatory variable (method), which have been assigned by the researchers.

Both the observational study and the experiment begin with a random sample from the population of smokers just now beginning to quit. In both cases, the individuals in the sample can be divided into categories based on the values of the explanatory variable: method used to quit. The response variable is success or failure after one year. Finally, in both cases, we would assess the relationship between the variables by comparing the proportions of success of the individuals using each method, using a two-way table and conditional percentages.

The only difference between the two methods is the way the sample is divided into categories for the explanatory variable (method). In the observational study, individuals are divided based upon the method by which they choose to quit smoking. The researcher does not assign the values of the explanatory variable, but rather records them as they naturally occur. In the experiment, the researcher deliberately assigns one of the four methods to each individual in the sample. The researcher intervenes by controlling the explanatory variable, and then assesses its relationship with the response variable.

Now that we have outlined two possible study designs, let’s return to the original question: which of the four methods for quitting smoking is most successful? Suppose the study’s results indicate that individuals who try to quit with the combination drug/therapy method have the highest rate of success, and those who try to quit with neither form of intervention have the lowest rate of success, as illustrated in the hypothetical two-way table below:

A Table describing the results of the study. The columns are labeled: "Quit," "Didn't Quit," "Total," and "% Who Quit." The Rows are labeled "Cold Turkey," "Drugs Only," "Therapy only," and "Drugs & Therapy." Here is the data in "Row,Column: Value" format: Cold Turkey, Quit: 12; Cold Turkey, Didn't Quit: 238; Cold Turkey, Total: 250; Cold Turkey, % Who Quit: 5%; Drugs Only, Quit: 60; Drugs Only, Didn't Quit: 190; Drugs Only, Total: 250; Drugs Only, % Who Quit: 24%; Therapy Only, Quit: 59; Therapy Only, Didn't Quit: 191; Therapy Only, Total: 250; Therapy Only, % Who Quit: 24%; Drugs & Therapy, Quit: 83; Drugs & Therapy, Didn't Quit: 167; Drugs & Therapy, Total: 250; Drugs & Therapy, % Who Quit: 33%

Can we conclude that using the combination drugs and therapy method caused the smokers to quit most successfully? Which type of design was implemented will play an important role in the answer to this question.

Did I get this?

Decide which type of study design was used for each of the following scenarios:

Causation and Observational Studies

Learning Objective

  • Explain how the study design impacts the types of conclusions that can be drawn.

Suppose the observational study described on the previous page were carried out, and researchers determined that the percentage succeeding with the combination drug/therapy method was highest, while the percentage succeeding with neither therapy nor drugs was lowest. In other words, suppose there is clear evidence of an association between method used and success rate. Could they then conclude that the combination drug/therapy method causes success more than using neither therapy nor a drug?

A visual representation of the relationship between the Exploratory Variable and Response Variable. The Exploratory Variable is the Method being used, and it might or might not be related to the Response Variable, which is either Success or Failure.

It is at precisely this point that we confront the underlying weakness of most observational studies: some members of the sample have opted for certain values of the explanatory variable (method of quitting), while others have opted for other values. It could be that those individuals may be different in additional ways that would also play a role in the response of interest. For instance, suppose women are more likely to choose certain methods to quit, and suppose women in general tend to quit more successfully than men. The data would make it appear that the method itself were responsible for success, whereas in truth it may just be that being female is the reason for success. We can express this scenario in terms of the key variables involved. In addition to the explanatory variable (method) and the response variable (success or failure), a third, lurking variable (gender) is tied in (or confounded) with the explanatory variable’s values, and may itself cause the response to be success or failure. The following diagram illustrates this situation.

Here, the Exploratory Variable, which is the Method, may affect the Response Variable, which is Success or Failure. We also have a Lurking Variable, which is Gender. It is confounded with the Exploratory Variable, so it may also be affecting the Response Variable.

Since the difficulty arises because of the lurking variable’s values being tied in with those of the explanatory variable, one way to attempt to unravel the true nature of the relationship between explanatory and response variables is to separate out the effects of the lurking variable. In general, we control for the effects of a lurking variable by separately studying groups that are similar with respect to this variable.

We could control for the lurking variable “gender” by studying women and men separately. Then, if both women and men who chose one method have higher success rates than those opting for another method, we would be closer to producing evidence of causation.

Now, we have two separate studies. The study for Women is represented using one circle representing the Method, connected to another circle with an arrow. The other circle represents Success or Failure. We have the same diagram for Men, showing that Women and Men are now separated into totally different studies.

The diagram above demonstrates how straightforward it is to control for the lurking variable gender.

Notice that we did not claim that controlling for gender would allow us to make a definite claim of causation, only that we would be closer to establishing a causal connection. This is due to the fact that other lurking variables may also be involved, such as the level of the participants’ desire to quit. Specifically, those who have chosen to use the drug/therapy method may already be the ones who are most determined to succeed, while those who have chosen to quit without investing in drugs or therapy may, from the outset, be less committed to quitting. The following diagram illustrates this scenario.

The Exploratory Variable, which is the Method, may affect the Response Variable, which is Success or Failure. We also have a Lurking Variable, which is Desire to Quit. It is confounded with the Exploratory Variable because those with more desire to quit may use drugs and/or therapy. So, the Lurking Variable may also be affecting the Response Variable.

To attempt to control for this lurking variable, we could interview the individuals at the outset in order to rate their desire to quit on a scale of 1 (weakest) to 5 (strongest), and study the relationship between method and success separately for each of the five groups. But desire to quit is obviously a very subjective thing, difficult to assign a specific number to. Realistically, we may be unable to effectively control for the lurking variable “desire to quit.”

Furthermore, who’s to say that gender and/or desire to quit are the only lurking variables involved? There may be other subtle differences among individuals who choose one of the four various methods that researchers fail to imagine as they attempt to control for possible lurking variables. For example, smokers who opt to quit using neither therapy nor drugs may tend to be in a lower income bracket than those who opt for (and can afford) drugs and/or therapy. Perhaps smokers in a lower income bracket also tend to be less successful in quitting because more of their family members and co-workers smoke. Thus, socioeconomic status is yet another possible lurking variable in the relationship between cessation method and success rate.

It is because of the existence of a virtually unlimited number of potential lurking variables that we can never be 100% certain of a claim of causation based on an observational study. On the other hand, observational studies are an extremely common tool used by researchers to attempt to draw conclusions about causal connections. If great care is taken to control for the most likely lurking variables (and to avoid other pitfalls which we will discuss presently), and if common sense indicates that there is good reason for one variable to cause changes in the other, then researchers may assert that an observational study provides good evidence of causation.

Observational studies are subject to other pitfalls besides lurking variables, arising from various aspects of the design for evaluating the explanatory and response values. The next pair of examples illustrates some other difficulties that may arise.

Example

Suppose researchers want to determine if people tend to snack more while they watch TV. One possible design that we considered was to recruit participants for an observational study, and give them journals to record their hourly activities the following day, including TV watched and snacks consumed. Then they could review the journals to determine if snack consumption was higher during TV times.

We identified this as a prospective observational study, carried forward in time. Studying people in the more natural setting of their own homes makes the study more realistic than a contrived experimental setting. Still, when people are obliged to record their behavior as it occurs, they may be too self-conscious to act naturally. They may want to avoid embarrassment and so they may cut back on their TV viewing, or their snack consumption, or the combination of the two.

Example

Yet another possible design is to recruit participants for a retrospective observational study. Ask them to recall, for each hour of the previous day, whether they were watching TV, and what snacks they consumed each hour. Determine if food consumption was higher during the TV times.

This design has the advantage of not disturbing people’s natural behavior in terms of TV viewing or snacking. It has the disadvantage of relying on people’s memories to record those variables’ values from the day before. But one day is a relatively short period of time to remember such details, and as long as people are willing to be honest, the results of this study could be fairly reliable. The issue of eliciting honest responses will be addressed in our discussion of sample surveys.

By now you should have an idea of how difficult—or perhaps even impossible—it is to establish causation in an observational study, especially due to the problem of lurking variables. The key to establishing causation is to rule out the possibility of any lurking variable, or in other words, to ensure that individuals differ only with respect to the values of the explanatory variable. In general, this is a goal which we have a much better chance of accomplishing by carrying out a well-designed experiment.

Causation and Experiments

Learning Objectives

  • Explain how the study design impacts the types of conclusions that can be drawn.
  • Identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

Recall that in an experiment, it is the researchers who assign values of the explanatory variable to the participants. The key to ensuring that individuals differ only with respect to explanatory values—which is also the key to establishing causation—lies in the way this assignment is carried out. Let’s return to the smoking cessation study as a context to explore the essential ingredients of experimental design.

Example

In our discussion of the distinction between observational studies and experiments, we described the following experiment: collect a representative sample of 1,000 individuals from the population of smokers who are just now trying to quit. We divide the sample into 4 groups of 250 and instruct each group to use a different method to quit. One year later, we contact the same 1,000 individuals and determine whose attempts succeeded while using our designated method.

This was an experiment, because the researchers themselves determined the values of the explanatory variable of interest for the individuals studied, rather than letting them choose.

We will begin by using the context of this smoking cessation example to illustrate the specialized vocabulary of experiments. First of all, the explanatory variable, or factor, in this case is the method used to quit. The different imposed values of the explanatory variable, or treatments (common abbreviation: ttt), consist of the four possible quitting methods. The groups receiving different treatments are called treatment groups. The group that tries to quit without drugs or therapy could be called the controlgroup—those individuals on whom no specific treatment was imposed. Ideally, the subjects (human participants in an experiment) in each treatment group differ from those in the other treatment groups only with respect to the treatment (quitting method). As mentioned in our discussion of why lurking variables prevent us from establishing causation in observational studies, eliminating all other differences among treatment groups will be the key to asserting causation via an experiment. How can this be accomplished?

Randomized Controlled Experiments

Your intuition may already tell you, correctly, that random assignment to treatments is the best way to prevent treatment groups of individuals from differing from each other in ways other than the treatment assigned. Either computer software or tables can be utilized to accomplish the random assignment. The resulting design is called a randomized controlled experiment, because researchers control values of the explanatory variable with a randomization procedure. Under random assignment, the groups should not differ significantly with respect to any potential lurking variable. Then, if we see a relationship between the explanatory and response variables, we have evidence that it is a causal one.

A visual representation of the experimental study. A large circle represents the entire population. Through random selection we generate the sample, which is represented as a smaller circle. The circle representing the samples is divided up evenly into 4 pieces, each piece representing one value of the explanatory variable (method). The pieces are treatment groups randomly assigned by researchers.

Comment

Note that in a randomized controlled experiment, a randomization procedure may be used in two phases. First, a sample of subjects is collected. Ideally it would be a random sample so that it would be perfectly representative of the entire population. (Comment: often researchers have no choice but to recruit volunteers. Using volunteers may help to offset one of the drawbacks to experimentation which will be discussed later, namely the problem of noncompliance.) Second, we assign individuals randomly to the treatment groups to ensure that the only difference between them will be due to the treatment and we can get evidence of causation. At this stage, randomization is vital.

Let’s discuss some other issues related to experimentation.

Inclusion of a Control Group

A common misconception is that an experiment must include a control group of individuals receiving no treatment. There may be situations where a complete lack of treatment is not an option, or where including a control group is ethically questionable, or where researchers explore the effects of a treatment without making a comparison. Here are a few examples:

Example

If doctors want to conduct an experiment to determine whether Prograf or Cyclosporin is more effective as an immunosuppressant, they could randomly assign transplant patients to take one or the other of the drugs. It would, of course, be unethical to include a control group of patients not receiving any immunosuppressants.

Example

Recently, experiments have been conducted in which the treatment is a highly invasive brain surgery. The only way to have a legitimate control group in this case is to randomly assign half of the subjects to undergo the entire surgery except for the actual treatment component (inserting stem cells into the brain). This, of course, is also ethically problematic (but, believe it or not, is being done).

Example

There may even be an experiment designed with only a single treatment. For example, makers of a new hair product may ask a sample of individuals to treat their hair with that product over a period of several weeks, then assess how manageable their hair has become. Such a design is clearly flawed because of the absence of a comparison group, but it is still an experiment because use of the product has been imposed by its manufacturers, rather than chosen naturally by the individuals. A flawed experiment is nevertheless an experiment.

Comment:

The word control is used in at least three different senses. In the context of observational studies, we control for a confounding variable by separating it out. Referring to an experiment as a controlled experiment stresses that the values of the experiment’s explanatory variables (factors) have been assigned by researchers, as opposed to having occurred naturally. In the context of experiments, the control group consists of subjects who do not receive a treatment, but who are otherwise handled identically to those who do receive the treatment.

Blind and Double-Blind Experiments

Suppose the experiment about methods for quitting smoking were carried out with randomized assignments of subjects to the four treatments, and researchers determined that the percentage succeeding with the combination drug/therapy method was highest, and the percentage succeeding with no drugs or therapy was lowest. In other words, suppose there is clear evidence of an association between method used and success rate. Could it be concluded that the drug/therapy method causes success more than trying to quit without using drugs or therapy? Perhaps.

Although randomized controlled experiments do give us a better chance of pinning down the effects of the explanatory variable of interest, they are not completely problem-free. For example, suppose that the manufacturers of the smoking cessation drug had just launched a very high-profile advertising campaign with the goal of convincing people that their drug is extremely effective as a method of quitting. Even with a randomized assignment to treatments, there would be an important difference among subjects in the four groups: those in the drug and combination drug/therapy groups would perceive their treatment as being a promising one, and may be more likely to succeed just because of added confidence in the success of their assigned method. Therefore, the ideal circumstance is for the subjects to be unaware of which treatment is being administered to them: in other words, subjects in an experiment should be (if possible) blind to which treatment they received.

How could researchers arrange for subjects to be blind when the treatment involved is a drug? They could administer a placebo pill to the control group, so that there are no psychological differences between those who receive the drug and those who do not. The word “placebo” is derived from a Latin word that means “to please.” It is so named because of the natural tendency of human subjects to improve just because of the “pleasing” idea of being treated, regardless of the benefits of the treatment itself. When patients improve because they are told they are receiving treatment, even though they are not actually receiving treatment, this is known as the placebo effect.

Next, how could researchers arrange for subjects to be blind when the treatment involved is a type of therapy? This is more problematic. Clearly, subjects must be aware of whether they are undergoing some type of therapy or not. There is no practical way to administer a “placebo” therapy to some subjects. Thus, the relative success of the drug/therapy treatment may be due to subjects’ enhanced confidence in the success of the method they happened to be assigned. We may feel fairly certain that the method itself causes success in quitting, but we cannot be absolutely sure.

When the response of interest is fairly straightforward, such as giving up cigarettes or not, then recording its values is a simple process in which researchers need not use their own judgment in making an assessment. There are many experiments where the response of interest is less definite, such as whether or not a cancer patient has improved, or whether or not a psychiatric patient is less depressed. In such cases, it is important for researchers who evaluate the response to be blind to which treatment the subject received, in order to prevent the experimenter effect from influencing their assessments. If neither the subjects nor the researchers know who was assigned what treatment, then the experiment is called double-blind.

The most reliable way to determine whether the explanatory variable is actually causing changes in the response variable is to carry out a randomized controlled double-blind experiment. Depending on the variables of interest, such a design may not be entirely feasible, but the closer researchers get to achieving this ideal design, the more convincing their claims of causation (or lack thereof) are.

Did I get this?

Pitfalls in Experimentation

Some of the inherent difficulties that may be encountered in experimentation are the Hawthorne effect, lack of realism, noncompliance, and treatments that are unethical, impossible, or impractical to impose.

We already introduced a hypothetical experiment to determine if people tend to snack more while they watch TV: Recruit participants for the study. While they are presumably waiting to be interviewed, half of the individuals sit in a waiting room with snacks available and a TV on. The other half sit in a waiting room with snacks available and no TV, just magazines. Researchers determine whether people consume more snacks in the TV setting.

Suppose that, in fact, the subjects who sat in the waiting room with the TV consumed more snacks than those who sat in the room without the TV. Could we conclude that in their everyday lives, and in their own homes, people eat more snacks when the TV is on? Not necessarily, because people’s behavior in this very controlled setting may be quite different from their ordinary behavior. If they suspect their snacking behavior is being observed, they may alter their behavior, either consciously or subconsciously. This phenomenon, whereby people in an experiment behave differently from how they would normally behave, is called the Hawthorne effect. Even if they don’t suspect they are being observed in the waiting room, the relationship between TV and snacking there might not be representative of what it is in real life. One of the greatest advantages of an experiment—that researchers take control of the explanatory variable—can also be a disadvantage in that it may result in a rather unrealistic setting. Lack of realism (also called lack of ecological validity) is a possible drawback to the use of an experiment rather than an observational study to explore a relationship. Depending on the explanatory variable of interest, it may be quite easy or it may be virtually impossible to take control of the variable’s values and still maintain a fairly natural setting.

In our hypothetical smoking cessation example, both the observational study and the experiment were carried out on a random sample of 1,000 smokers with intentions to quit. In the case of the observational study, it would be reasonably feasible to locate 1,000 such people in the population at large, identify their intended method, and contact them again a year later to establish whether they succeeded or not. In the case of the experiment, it is not so easy to take control of the explanatory variable (cessation method) merely by telling all 1,000 subjects what method they must use. Noncompliance (failure to submit to the assigned treatment) could enter in on such a large scale as to render the results invalid. In order to ensure that the subjects in each treatment group actually undergo the assigned treatment, researchers would need to pay for the treatment and make it easily available. The cost of doing that for a group of 1,000 people would go beyond the budget of most researchers. Even if the drugs or therapy were paid for, it is very unlikely that most of the subjects contacted at random would be willing to use a method not of their own choosing, but dictated by the researchers. From a practical standpoint, such a study would most likely be carried out on a smaller group of volunteers, recruited via flyers or some other sort of advertisement. The fact that they are volunteers might make them somewhat different from the larger population of smokers with intentions to quit, but it would reduce the more worrisome problem of non-compliance. Volunteers may have a better overall chance of success, but if researchers are primarily concerned with which method is most successful, then the relative success of the various methods should be roughly the same for the volunteer sample as it would be for the general population, as long as the methods are randomly assigned. Thus, the most vital stage for randomization in an experiment is during the assignment of treatments, rather than the selection of subjects.

There are other, more serious drawbacks to experimentation, as illustrated in the following hypothetical examples:

Example

Suppose researchers want to determine if the drug Ecstasy causes memory loss. One possible design would be to take a group of volunteers and randomly assign some to take Ecstasy on a regular basis, while the others are given a placebo. Test them periodically to see if the Ecstasy group experiences more memory problems than the placebo group.

The obvious flaw in this experiment is that it is unethical (and actually also illegal) to administer a dangerous drug like Ecstasy, even if the subjects are volunteers. The only feasible design to seek answers to this particular research question would be an observational study.

Example

Suppose researchers want to determine whether females wash their hair more frequently than males.

It is impossible to assign some subjects to be female and others male, and so an experiment is not an option here. Again, an observational study would be the only way to proceed.

Example

Suppose researchers want to determine whether being in a lower income bracket may be responsible for obesity in women, at least to some extent, because they can’t afford more nutritious meals and don’t have the means to participate in fitness activities.

The socioeconomic status of the study subject is a variable that cannot be controlled by the researchers, so an experiment is impossible. (Even if the researchers could somehow raise the money to provide a random sample of women with substantial salaries, the effects of their eating habits during their lives before the study began would still be present, and would affect the study’s outcome.)

These examples should convince you that, depending on the variables of interest, researching their relationship via an experiment may be too unrealistic, unethical, or impractical. Observational studies are subject to flaws, but often they are the only recourse.

Let’s summarize what we’ve learned so far:

1. Observational studies:

  • The explanatory variable’s values are allowed to occur naturally.
  • Because of the possibility of lurking variables, it is difficult to establish causation.
  • If possible, control for suspected lurking variables by studying groups of similar individuals separately.
  • Some lurking variables are difficult to control for; others may not be identified.

2. Experiments

  • The explanatory variable’s values are controlled by researchers (treatment is imposed).
  • Randomized assignment to treatments automatically controls for all lurking variables.
  • Making subjects blind avoids the placebo effect.
  • Making researchers blind avoids conscious or subconscious influences on their subjective assessment of responses.
  • A randomized controlled double-blind experiment is generally optimal for establishing causation.
  • A lack of realism may prevent researchers from generalizing experimental results to real-life situations.
  • Noncompliance may undermine an experiment. A volunteer sample might solve (at least partially) this problem.
  • It is impossible, impractical or unethical to impose some treatments.

Experiments With More Than One Explanatory Variable

Learning Objectives

  • Identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

It is not uncommon for experiments to feature two or more explanatory variables (called factors). In this course, we focus on exploratory data analysis and statistical inference in situations which involve only one explanatory variable. Nevertheless, we will now consider the design for experiments involving several explanatory variables, in order to familiarize students with their basic structure.

Example

Suppose researchers are not only interested in the effect of diet on blood pressure, but also the effect of two new drugs. Subjects are assigned to either Control Diet (no restrictions), Diet #1, or Diet #2, (the variable diet has, then, 3 possible values) and are also assigned to receive either Placebo, Drug #1, or Drug #2 (the variable Drug, then, also has three values). This is an example where the experiment has two explanatory variables and a response variable. In order to set up such an experiment, there has to be one treatment group for every combination of categories of the two explanatory variables. Thus, in this case there are 3 * 3 = 9 combinations of the two variables to which the subjects are assigned. The treatment groups are illustrated and labeled in the following table:

The column headings for the table are for the Diet variable: "No-diet", "Special diet 1" and "Special diet 2." The Rows are for the drug variable: "Placebo," "Drug 1," and "Drug 2." There are 9 cells in the table, one for every possible combination of row and column. These cells are labeled "tttX", where X is in the range of [1-9], representing each combination.

Subjects would be randomly assigned to one of the nine treatment groups. If we find differences in the proportions of subjects who achieve the lower “moderate zone” blood pressure among the nine treatment groups, then we have evidence that the diets and/or drugs may be effective for reducing blood pressure.

From the population we generate a sample. The individuals of the sample are represented as a whole visually with a circle. These individuals are then divided by randomly assigning them to one of the 9 treatment groups. These treatment groups are "ttt1: no-diet and placebo,", "ttt2: diet 1 and placebo", "ttt3: diet 2 and placebo", and so on, up to "ttt9: diet 2 and drug 2." The responses from each of these treatment groups are compared.

Comments

  1. Recall that randomization may be employed at two stages of an experiment: in the selection of subjects, and in the assignment of treatments. The former may be helpful in allowing us to generalize what occurs among our subjects to what would occur in the general population, but the reality of most experimental settings is that a convenience or volunteer sample is used. Most likely the blood pressure study described above would use volunteer subjects. The important thing is to make sure these subjects are randomly assigned to one of the nine treatment combinations.
  2. In order to gain optimal information about individuals in all the various treatment groups, we would like to make assignments not just randomly, but also evenly. If there are 90 subjects in the blood pressure study described above, and 9 possible treatment groups, then each group should be filled randomly with 10 individuals. A simple random sample of 10 could be taken from the larger group of 90, and those individuals would be assigned to the first treatment group. Next, the second treatment group would be filled by a simple random sample of 10 taken from the remaining 80 subjects. This process would be repeated until all 9 groups are filled with 10 individuals each.

Did I get this?

A university was interested in examining the overall effectiveness of its online statistics course, along with the effectiveness of particular aspects of the course. First, the university wanted to see whether the online course was better than a standard course. Second, the university wanted to know whether students learned best using Excel, using Minitab, or using no statistical package at all. The university randomly selected a group of 30 students and administered one of the different variants of the course (i.e., traditional or online, coupled with one of the software options) to each student. The success of each variant was measured by the students’ average improvement between a pre-test and a post-test.

Modifications to Randomization

Learning Objectives

  • Identify the design of a study (controlled experiment vs. observational study) and other features of the study design (randomized, blind etc.).

In some cases, an experiment’s design may be enhanced by relaxing the requirement of total randomization and blocking the subjects first, dividing them into groups of individuals who are similar with respect to an outside variable that may be important in the relationship being studied. This can help ensure that the effect of treatments, as well as background variables, are most accurately measured. In blocking, we simply split the sampled subjects into blocks based upon the different values of the background variable, and then randomly allocate treatments within each block. Thus, blocking in the assignment of subjects is analogous to stratification in sampling.

For example, consider again our experiment examining the differences between three versions of software from the last Learn By Doing activity. If we suspected that gender might affect individuals’ software preferences, we might choose to allocate subjects to separate blocks, one for males and one for females. Within each block, subjects are randomly assigned to treatments and the treatment proceeds as usual. A diagram of blocking in this situation is below:

We have 2 blocks, 3 treatment groups each (by random assignment). From the population we generate a sample. This sample of individuals is then split into two blocks, Males and Females. Each block is then randomly split further into the three treatment groups: "tt1: existing software," "ttt2 new software 1," and "ttt3 new software 2." So, we end up with 6 total groups. Within each group the responses from the treatment groups are compared to each other, generating results separately for each block.

Example

Suppose producers of gasoline want to compare which of two types of gas results in better mileage for automobiles. In case the size of the vehicle plays a role in the effectiveness of different types of gasoline, they could first block by vehicle size, then randomly assign some cars within each block to Gasoline A and others to Gasoline B:
This example consists of 2 blocks, 2 treatment groups each (by random assignment). From the population we generate a sample, then separate it into two blocks, "Small" and "Large," according to the vehicle size.; Within these blocks we randomly assign vehicles to use either Gasoline A or Gasoline B (So, each block is split into two treatment groups, "ttt1: Gasoline A", and "ttt2: Gasoline B"), resulting in 4 total groups. Then, within each block, we compare the responses, so we obtain results for each block individually.
In the extreme, researchers may examine a relationship for a sample of blocks of just two individuals who are similar in many important respects, or even the same individual whose responses are compared for two explanatory values.

Example

For example, researchers could compare the effects of Gasoline A and Gasoline B when both are used on the same car, for a sample of many cars of various sizes and models.
In this Matched Pairs Design we have n blocks of individual cars, with 2 treatment groups each, done by random assignment. From the population we generate the sample group. The sample group is then placed into n blocks for each individual car. Each of these blocks is subjected to two treatments by random assignment. These treatments are "ttt1 Gasoline A" and "ttt2 Gasoline B." For each car, the responses to each treatment are compared, resulting in a treatment for each
Such a study design, called matched pairs, may enable us to pinpoint the effects of the explanatory variable by comparing responses for the same individual under two explanatory values, or for two individuals who are as similar as possible except that the first gets one treatment, and the second gets another (or serves as the control). Treatments should usually be assigned at random within each pair, or the order of treatments should be randomized for each individual. In our gasoline example, for each car the order of testing (Gasoline A first, or Gasoline B first) should be randomized.

Example

Suppose researchers want to compare the relative merits of toothpastes with and without tartar control ingredients. In order to make the comparison between individuals who are as similar as possible with respect to background and diet, they could obtain a sample of identical twins. One of each pair would randomly be assigned to brush with the tartar control toothpaste, while the other would brush with regular toothpaste of the same brand. These would be provided in unmarked tubes, so that the subjects would be blind. To make the experiment double-blind, dentists who evaluate the results would not know who used which toothpaste.
Paired Design. There are n blocks, each represented by a circle with two identical twins in them. Randomly, the treatment of tartar or regular toothpaste is given to each twin. So, each circle has two twins, two types of toothpaste, and each twin randomly gets assigned one type of toothpaste.
“Before-and-after” studies are another common type of matched pairs design. For each individual, the response variable of interest is measured twice: first before the treatment, then again after the treatment. The categorical explanatory variable is which treatment was applied, or whether a treatment was applied, to that participant.

Comment

We have explained data production as a two-stage process: first obtain the sample, then evaluate the variables of interest via an appropriate study design. Even though the steps are carried out in this order chronologically, it is generally best for researchers to decide on a study design before they actually obtain the sample. For the toothpaste example above, researchers would first decide to use the matched pairs design, then obtain a sample of identical twins, then carry out the experiment and assess the results.

Did I get this?

Researchers wanted to study whether or not Botox injected under the arms can reduce sweating.

Sample Surveys

Learning Objective

  • Determine how the features of a survey impact the collected data and the accuracy of the data.

A sample survey is a particular type of observational study in which individuals report variables’ values themselves, frequently by giving their opinions. Researchers have several options to choose from when deciding how to survey the individuals involved: in person, or via telephone, Internet, or mail.

The following issues in the design of sample surveys will be discussed:

  • open vs. closed questions
  • unbalanced response options
  • leading questions
  • planting ideas with questions
  • complicated questions
  • sensitive questions

These issues are best illustrated with a variety of concrete examples.

Suppose you want to determine the musical preferences of all students at your university, based on a sample of students. In the Sampling section, we discussed various ways to obtain the sample, such as taking a simple random sample from all students at the university, then contacting the chosen subjects via email to request their responses and following up with a second email to those who did not respond the first time. This method would ensure a sample that is fairly representative of the entire population of students at the university, and avoids the bias that might result from a flawed designs such as a convenience sample or a volunteer sample.

However, even if we managed to select a representative sample for a survey, we are not yet home free: we must still compose the survey question itself so that the information we gather from the sampled students correctly represents what is true about their musical preferences. Let us consider some possibilities:

Question: “What is your favorite kind of music?”

This is what we call an open question, which allows for almost unlimited responses. It may be difficult to make sense of all the possible categories and subcategories of music that survey respondents could come up with. Some may be more general than what you had in mind (“I like modern music the best”) and others too specific (“I like Japanese alternative electronic rock by Cornelius”). Responses are much easier to handle if they come from a closed question:

Question: Which of these types of music do you prefer: classical, rock, pop, or hip-hop?

What will happen if a respondent is asked the question as worded above, and he or she actually prefers jazz or folk music or gospel? He or she may pick a second-favorite from the options presented, or try to pencil in the real preference, or may just not respond at all. Whatever the outcome, it is likely that overall, the responses to the question posed in this way will not give us very accurate information about general music preferences. If a closed question is used, then great care should be taken to include all the reasonable options that are possible, including “not sure.” Also, in case an option was overlooked, “other:___________” should be included for the sake of thoroughness.

Many surveys ask respondents to assign a rating to a variable, such as in the following:

Question: How do you feel about classical music? Circle one of these: I love it, I like it very much, I like it, I don’t like it, I hate it.

Notice that the options provided are rather “top-heavy,” with three favorable options vs. two unfavorable. If someone feels somewhat neutral, they may opt for the middle choice, “I like it,” and a summary of the survey’s results would distort the respondents’ true opinions.

Some survey questions are either deliberately or unintentionally biased towards certain responses:

Question: “Do you agree that classical music is the best type of music, because it has survived for centuries and is not only enjoyable, but also intellectually rewarding? (Answer yes or no.)”

This sort of wording puts ideas in people’s heads, urging them to report a particular opinion. One way to test for bias in a survey question is to ask yourself, “Just from reading the question, would a respondent have a good idea of what response the surveyor is hoping to elicit?” If the answer is yes, then the question should have been worded more neutrally.

Sometimes, survey questions are ordered in such a way as to deliberately bias the responses by planting an idea in an earlier question that will sway people’s thoughts in a later question.

Question: In the year 2002, there was much controversy over the fact that the Augusta National Golf Club, which hosts the Masters Golf Tournament each year, does not accept women as members. Defenders of the club created a survey that included the following statements. Respondents were supposed to indicate whether they agreed or disagreed with each statement:

“The First Amendment of the U.S. Constitution applies to everyone regardless of gender, race, religion, age, profession, or point of view.”

“The First Amendment protects the right of individuals to create a private organization consisting of a specific group of people based on age, gender, race, ethnicity, or interest.”

“The First Amendment protects the right of organizations like the Boy Scouts, the Girls Scouts, and the National Association for the Advancement of Colored People to exist.”

“Individuals have a right to join a private group, club, or organization that consists of people who share the same interests and personal backgrounds as they do if they so desire.”

“Private organizations that are not funded by the government should be allowed to decide who becomes a member and who does not become a member on their own, without being forced to take input from other outside people or organizations.”

Notice how the first and second statements steer people to favor the opinion that specialized groups may form private clubs. The third statement reminds people of organizations that are formed by groups on the basis of gender and race, setting the stage for them to agree with the fourth statement, which supports people’s rights to join any private club. This in turn leads into the fifth statement, which focuses on a private organization’s right to decide on its membership. As a group, the questions attempt to relentlessly steer a respondent towards ultimately agreeing with the club’s right to exclude women.

Sometimes surveyors attempt to get feedback on more than one issue at a time.

Question: “Do you agree or disagree with this statement: ‘I don’t go out of my way to listen to modern music unless there are elements of jazz, or else lyrics that are clear and make sense.’”

Put yourself in the place of people who enjoy jazz and straightforward lyrics, but don’t have an issue with music being “too modern,” per se. The logic of the question (or lack thereof) may escape the respondents, and they would be too confused to supply an answer that correctly conveys their opinion. Clearly, simple questions are much better than complicated ones; rather than try to gauge opinions on several issues at once, complex survey questions like this should be broken down into shorter, more concise ones.

Depending on the topic, we cannot always assume that survey respondents will answer honestly.

Question 1: “Have you eaten rutabagas in the past year?”

If respondents answer no, then we have good reason to believe that they did not eat rutabagas in the past year.

Question 2: “Have you used illegal drugs in the past year?”

If respondents answer no, then it is still a possibility that they did use illegal drugs, but didn’t want to admit it.

Effective techniques for collecting accurate data on sensitive questions are a main area of inquiry in statistics. One simple method is randomized response, which allows individuals in the sample to answer anonymously, while the researcher still gains information about the population. This technique is best illustrated by an example.

Example

For the question, “Have you used illegal drugs in the past year?” respondents are told to flip a fair coin (in private) before answering and then answer based on the result of the coin flip: if the coin flip results in “Heads,” they should answer “Yes” (regardless of the truth), if a coin flip results in “Tails,” they should answer truthfully. Thus, roughly half of the respondents are “truth-tellers,” and the other half give the uncomfortable answer “Yes,” without the interviewer’s knowledge of who is in which group. The respondent who flips “Tails” and answers truthfully knows that he or she cannot be distinguished from someone who got “Heads” in the coin toss. Hopefully, this is enough to encourage respondents to answer truthfully. As we will learn later in the course, the surveyor can then use probability methods to estimate the proportion of respondents who admit they used illegal drugs in this scenario, while being unable to identify exactly which respondents have been drug abusers.
Besides using the randomized response method, surveyors may encourage honest answers from respondents in various other ways. Tactful wording of questions can be very helpful. Giving people a feeling of anonymity by having them complete questionnaires via computer, rather than paper and pencil, is another commonly used technique.

Let’s summarize

A sample survey is a type of observational study in which respondents assess variables’ values (often by giving an opinion).

  • Open questions are less restrictive, but responses are more difficult to summarize.
  • Closed questions may be biased by the options provided.
  • Closed questions should permit options such as “other:______” and/or “not sure” if those options may apply.
  • Questions should be worded neutrally.
  • Earlier questions should not deliberately influence responses to later questions.
  • Questions shouldn’t be confusing or complicated.

*Survey method and questions should be carefully designed to elicit honest responses if there are sensitive issues involved.

Did I get this?

In this chapter we distinguished among different types of studies and learned the details of each type of study design. By doing so, we also expanded our understanding of the issue of establishing causation that was first discussed in the previous unit of the course. In the Exploratory Data Analysis unit, we learned that in general, association does not imply causation, due to the fact that lurking variables might be responsible for the association we observe, which means we cannot establish that there is a causal relationship between our “explanatory” variable and our response variable.

In this chapter we completed the causation puzzle by learning under what circumstances an observed association between variables CAN be interpreted as causation. We saw that in observational studies, the best we can do is to control for what we think might be potential lurking variables, but we can never be sure that there aren’t any others that we didn’t anticipate. Therefore, we can come closer to establishing causation, but never really establish it.

The only way we can, at least in theory, eliminate the effect of (or control for) ALL lurking variables is by conducting a randomized control experiment, in which subjects are randomly assigned to one of the treatment groups. Only in this case can we interpret an observed association as causation. Obviously, due to ethical or other practical reasons, not every study can be conducted as a randomized experiment. Where possible, however, a double-blind randomized control experiment is about the best study design we can use.

Another very common study design is the survey. While a survey is a special kind of observational study, it really is treated as a separate design, since it is so common and is the type of study that the general public is most often exposed to (polls). It is important that we be aware of the fact that the wording, ordering, or type of questions asked in a poll could have a impact on the response. In order for a survey’s results to be reliable, these issues should be carefully considered when the survey is designed.

End of Section Questions

Share This Book