Notes: Data Suitability and Quality

Data are the foundations of all research. If data are good, then even a mediocre researcher can discover interesting and worthwhile results. If data are poor, then even a highly skilled researcher will be at a loss to produce anything useful.

While data raise many questions for a sociological research project, perhaps the two most fundamental concern the suitability and the quality of the data. The suitability of the data depends on how well it fits the needs of our research questions. The quality of the data depends on how accurately it reflects and measures the social realities to which it refers.

Social data always fall short. A fully suitable set of data would contain indicators for every possible cause, mediator, and outcome suggested by every plausible alternative theory to answer our research question. These indicators would, individually or in combinations, provide a sound, valid representation of the “operational definitions” and, more distantly, the “theoretical concepts” that provide the content and background of our research question. Also, we would have enough cases that we could hope to distinguish all the relationships among these things. In reality, the data available for a project will be a subset of this ideal. If we rely on existing data sets – such as surveys, censuses. or official records – the data will have a limited set of variables, each of which reflects a way of establishing a “fact” that will have varied correspondence to our project needs. If we collect our own data, we can only record a limited number of observations from a limited number of subjects under circumscribed conditions. This is true regardless of whether we collect data through interviews, participant observation, experimentation, or the examination of records. So, the suitability of the data for our research goal becomes a judgment of how much leverage we have for solving our problem given the unavoidable limitations to what relevant social phenomena are represented in the data. (The restrictions of the data also compel us to revise and narrow the questions we ask, as it is obviously futile to pursue research questions for which we cannot get data adequate to produce a useful answer.)

The quality of the data is a more technical consideration, but also important. All data begin as observations. Commonly these are active observations in the sense that the observer does something to trigger the data production, such as asking a question. Whether we are the original observers or we use data collected by others, the observations reflect the character of the triggering stimulus, such as the wording of a question and the status of the question source (such as their authority or gender). The meaning of an observation also depends on the transparency, ambiguity, and volatility of what is being observed. All social data are subject to deception and bias. The more interpretation involved, whether by the observer or the subject, the more room for deception and bias. We see what we expect as a result of our theoretical orientation and our cultural background and subjects report what they understand and feel is appropriate. The more observers and the more subjects, the more that the variability can help correct for biases. Other than experiments, small-sample studies in which all the data are collected by the researcher are particularly vulnerable to such errors. The Achilles’ heel of “qualitative” studies is generally not that the small sample prevents generalization, but a lack of strict controls for the quality of the data, particularly for the influence of researcher bias.

Sociology Honors Seminar

Notes: Data Suitability and Quality