The greatest obstacles to identifying social causes are the
chaotic complexity and the dense interdependencies of social
life. We are overwhelmed by the multiplicity of potential
causes. Experiments enforce restrictions over what can vary,
but both practical and ethical obstacles commonly prevent
experiments for most of the issues that motivate sociologists and
other social scientists. Strategic sampling looks to narrow
the differences between study populations to the causal influences
we want to understand. Although adding control
variables is
commonly discussed in the context of quantitative
research,
the issues have an equivalent conceptual status in all social research.
Control
variables refer to any condition or process
that our research does not include as part of the causal
relationships that we seek to measure and test, but which we suspect
may influence or condition those causal relationships we want to
explore.
In a standard experimental design, the research will have a treatment group and a control group. The treatment group – which may comprise patients receiving a drug or plots of land receiving fertilizer or whatever other unit of analysis that is relevant – is exposed to whatever we are testing while the control group is not. In a drug trial, for example, the control group may receive a placebo while the treatment group receives the drug being evaluated. If the experiment is double-blind, neither the subjects nor the researchers know who received the drug or the placebo. If we are testing a fertilizer, we might divide a large plot of land into small squares, then alternate using our test fertilizer or a comparison fertilizer on the squares to create a checkerboard pattern (you might want to consider why this could be preferable to random assignment of the squares). The control group therefore consists of research subjects, whatever they may be, that are randomly or systematically selected from the same pool that supplies the treatment subjects. The control group will undergo the same sequence of experiences as the treatment subjects with the exception that they do not receive the treatment. The goal is to allow the researcher to have confidence that any difference between the measured outcomes for the two groups can only be due to the treatment, because the two groups are otherwise the same, having similar composition (by the consequences of random or systematic selection) and similar experiences.
With observational data from surveys, censuses, registries,
interviewing, historical records, and the like, used for research
where experimentation is implausible for varied reasons, we have no
way to replicate the control versus treatment contrast of an
experiment. So, if we simply look at the relationships between
what we believe are critical causes and the outcomes in question, we
are in the troubling circumstance that the relationships we measure
could be due to causes outside the group that interests us. For
example, it has become commonplace in the U.S. for the media and
politicians to present general comparisons between racial/ethnic
groups (Whites, Blacks, Hispanics, etc.) about everything from infant
mortality to political attitudes. Yet, we know that a large
proportion of the apparent differences between these groups is more
accurately attributed to differences in their class
composition (as an amalgam of economically defined conditions such as
income and education). If, for example, we want to consider the
educational disadvantages of being born into a poor Black families, it
is critical to consider how much of the disadvantage reflects poverty
and how much results from racial identity.
As we cannot eliminate the influence of “other” social conditions
and processes on the outcomes that concern us, we try to control for
these influences either prior to data collection through strategic
sampling or after through control variables or controlled
comparisons. To extend our previous example, if we are
personally interviewing people, we might choose four relatively
homogeneous neighborhoods that differ by residents' typical class and
race (lower-class Black, lower-class White, middle-class Black, and
middle-class White). Alternatively, if we are using survey data,
we might use any multivariate statistical technique to distinguish
class from race effects and simultaneously control for other
conditions observed in the data that might disguise or exaggerate the
effects of race or class. We are generally concerned with those
“other” influences that might have direct or indirect causal relations
with both the effects
or outcomes (dependent variables) we are
trying to explain and
those things competing for recognition
as causes (independent variables). These “other” influences
might create a “spurious” relationship between our causal variables
and our outcome variables that we would mistakenly infer to be
causal. Or they might suppress or disguise a causal
relationship so that we do not recognize it.
Realistically, scholarly journals in sociology and related
disciplines have published many large-sample (commonly termed quantitative
)
papers where the authors have apparently chosen control variables
using an uncritical, opportunistic strategy. They match the
variables available in the data they use with implicit lists of
variables often used as controls in sociological research. As a
common strategy, researchers run their analyses with an extended set
of these “control” variables, eventually keeping two subgroups: (1)
those that produce statistically significant coefficients (or the
like) that seem to bolster the “discovery” character of the research
and (2) those that show no or little effect, but that other scholars
might question if absent (e.g. “why didn’t you control for
race?”). From the perspective of serious scholarship and
scientific integrity, this is seriously flawed. From the
perspective of efficient production of publications within the scope
of common, current practices, this can be a practical adaptation,
however unwarranted.
Analogously, small-sample studies (commonly considered qualitative
)
suffer from a failure to control for conflated
influences. This is partly because the room to take causal
interactions into account declines with the sample size, but it also
too often signals inadequate research design and overreaching
interpretations.
All the above may seem an unneeded repetition of what is obvious
to any graduate student or even advanced undergraduate in the social
sciences. Unfortunately, a sizable part of the work
published in sociology and related disciplines still falls prey to
these issues. What the statistically inclined refer to as
omitted variables or more generally as misspecification of
models. Such mistakes can be reasonably catalogued with
statistical regression models: we omit variables that matter, we
include ones that do not, we use the wrong functional form
(for example, assuming two causal influences are additive when
they are really multiplicative), we make mistaken assumptions about
the way errors occur, and so on. However, the analogous
issues exist for small-sample qualitative
research or
theoretical arguments. These are all mistakes we make about the
possibilities of causal influences, how they relate to each other, and
how we can avoid mistaken claims and inferences that result from
seeing only a slice of the picture.
We have no general solution to these issues. But awareness and vigilance can go a long way.