What Causes Gender Inequality?
     ... Analytical Strategies

What Causes Gender Inequality? ... Analytical Strategies

SOC-GA 2227

Robert Max Jackson




~~~~~  Notes: Isolating Causes  ~~~~~

~~  via Control Variables or Strategic Sampling  ~~



The greatest obstacles to identifying social causes are the chaotic complexity and the dense interdependencies of social life.  We are overwhelmed by the multiplicity of potential causes.  Experiments enforce restrictions over what can vary, but both practical and ethical obstacles commonly prevent experiments for most of the issues that motivate sociologists and other social scientists.  Strategic sampling looks to narrow the differences between study populations to the causal influences we want to understand.  Although adding control variables is commonly discussed in the context of quantitative research, the issues have an equivalent conceptual status in all social research.  Control variables refer to any condition or process that our research does not include as part of the causal relationships that we seek to measure and test, but which we suspect may influence or condition those causal relationships we want to explore. 

In a standard experimental design, the research will have a treatment group and a control group.  The treatment group – which may comprise patients receiving a drug or plots of land receiving fertilizer or whatever other unit of analysis that is relevant – is exposed to whatever we are testing while the control group is not.  In a drug trial, for example, the control group may receive a placebo while the treatment group receives the drug being evaluated.  If the experiment is double-blind, neither the subjects nor the researchers know who received the drug or the placebo.  If  we are testing a fertilizer, we might divide a large plot of land into small squares, then alternate using our test fertilizer or a comparison fertilizer on the squares to create a checkerboard pattern (you might want to consider why this could be preferable to random assignment of the squares).  The control group therefore consists of research subjects, whatever they may be, that are randomly or systematically selected from the same pool that supplies the treatment subjects.  The control group will undergo the same sequence of experiences as the treatment subjects with the exception that they do not receive the treatment.  The goal is to allow the researcher to have confidence that any difference between the measured outcomes for the two groups can only be due to the treatment, because the two groups are otherwise the same, having similar composition (by the consequences of random or systematic selection) and similar experiences.

With observational data from surveys, censuses, registries, interviewing, historical records, and the like, used for research where experimentation is implausible for varied reasons, we have no way to replicate the control versus treatment contrast of an experiment.  So, if we simply look at the relationships between what we believe are critical causes and the outcomes in question, we are in the troubling circumstance that the relationships we measure could be due to causes outside the group that interests us.  For example, it has become commonplace in the U.S. for the media and politicians to present general comparisons between racial/ethnic groups (Whites, Blacks, Hispanics, etc.) about everything from infant mortality to political attitudes.  Yet, we know that a large proportion of the apparent differences between these groups is more accurately attributed to  differences in their class composition (as an amalgam of economically defined conditions such as income and education).  If, for example, we want to consider the educational disadvantages of being born into a poor Black families, it is critical to consider how much of the disadvantage reflects poverty and how much results from racial identity.

As we cannot eliminate the influence of “other” social conditions and processes on the outcomes that concern us, we try to control for these influences either prior to data collection through strategic sampling or after through control variables or controlled comparisons.  To extend our previous example, if we are personally interviewing people, we might choose four relatively homogeneous neighborhoods that differ by residents' typical class and race (lower-class Black, lower-class White, middle-class Black, and middle-class White).  Alternatively, if we are using survey data, we might use any multivariate statistical technique to distinguish class from race effects and simultaneously control for other conditions observed in the data that might disguise or exaggerate the effects of race or class.  We are generally concerned with those “other” influences that might have direct or indirect causal relations with both the effects or outcomes (dependent variables) we are trying to explain and those things competing for recognition as causes (independent variables).  These “other” influences might create a “spurious” relationship between our causal variables and our outcome variables that we would mistakenly infer to be causal.  Or they might suppress or disguise a causal relationship so that we do not recognize it. 

Realistically, scholarly journals in sociology and related disciplines have published many large-sample (commonly termed quantitative) papers where the authors have apparently chosen control variables using an uncritical, opportunistic strategy.  They match the variables available in the data they use with implicit lists of variables often used as controls in sociological research.  As a common strategy, researchers run their analyses with an extended set of these “control” variables, eventually keeping two subgroups: (1) those that produce statistically significant coefficients (or the like) that seem to bolster the “discovery” character of the research and (2) those that show no or little effect, but that other scholars might question if absent (e.g. “why didn’t you control for race?”).  From the perspective of serious scholarship and scientific integrity, this is seriously flawed.  From the perspective of efficient production of publications within the scope of common, current practices, this can be a practical adaptation, however unwarranted.

Analogously, small-sample studies (commonly considered qualitative) suffer from a failure to control for conflated influences.  This is partly because the room to take causal interactions into account declines with the sample size, but it also too often signals inadequate research design and overreaching interpretations.

All the above may seem an unneeded repetition of what is obvious to any graduate student or even advanced undergraduate in the social sciences.   Unfortunately, a sizable part of the work published in sociology and related disciplines still falls prey to these issues.  What the statistically inclined refer to as omitted variables or more generally as misspecification of models.  Such mistakes can be reasonably catalogued with statistical regression models: we omit variables that matter, we include ones that do not, we use the wrong functional form (for example, assuming two causal influences are additive  when they are really multiplicative), we make mistaken assumptions about the way errors occur, and so on.   However, the analogous issues exist for small-sample qualitative research or theoretical arguments.  These are all mistakes we make about the possibilities of causal influences, how they relate to each other, and how we can avoid mistaken claims and inferences that result from seeing only a slice of the picture. 

We have no general solution to these issues.  But awareness and vigilance can go a long way.