Social Science

Friday, Jan 21, 2011

Unlike physics or biology, the social sciences have not demonstrated the capacity to produce a substantial body of useful, nonobvious, and reliable predictive rules about what they study—that is, human social behavior, including the impact of proposed government programs.

The missing ingredient is controlled experimentation, which is what allows science positively to settle certain kinds of debates. How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up. Social scientists may make claims as fascinating and counterintuitive as the proposition that a heavy piece of machinery can fly, but these claims are frequently untested by experiment, which means that debates like the one in 2009 will never be settled. For decades to come, we will continue to be lectured by what are, in effect, Keynesian and non-Keynesian economists.

Over many decades, social science has groped toward the goal of applying the experimental method to evaluate its theories for social improvement. Recent developments have made this much more practical, and the experimental revolution is finally reaching social science. The most fundamental lesson that emerges from such experimentation to date is that our scientific ignorance of the human condition remains profound. Despite confidently asserted empirical analysis, persuasive rhetoric, and claims to expertise, very few social-program interventions can be shown in controlled experiments to create real improvement in outcomes of interest.

James Lind is conventionally credited with executing the first clinical trial in the modern sense of the term. In 1747, he divided 12 scurvy-stricken crew members on the British ship Salisbury into six treatment groups of two sailors each. He treated each group with a different therapy, tried to hold all other potential causes of change to their condition as constant as possible, and observed that the two patients treated with citrus juice showed by far the greatest improvement.

But clinical trials place an enormous burden on being sure that the treatment under evaluation is the only difference between the two groups. And as experiments began to move from fields like classical physics to fields like therapeutic biology, the number and complexity of potential causes of the outcome of interest—what I term “causal density”—rose substantially. It became difficult even to identify, never mind actually hold constant, all these causes.

In 1884, the brilliant but erratic American polymath C. S. Peirce hit upon a solution when he randomly assigned participants to the test and control groups. Random assignment permits a medical experimentalist to conclude reliably that differences in outcome are caused by differences in treatment. That’s because even causal differences among individuals of which the experimentalist is unaware—say, that genetic predisposition—should be roughly equally distributed between the test and control groups, and therefore not bias the result.

In theory, social scientists, too, can use that approach to evaluate proposed government programs. In the social sciences, such experiments are normally termed “randomized field trials” (RFTs).

By about a quarter-century ago, however, it had become obvious to sophisticated experimentalists that the idea that we could settle a given policy debate with a sufficiently robust experiment was naive. The reason had to do with generalization, which is the Achilles’ heel of any experiment, whether randomized or not.

A detailed review of every regression model published between 1968 and 2005 in Criminology, a leading peer-reviewed journal, demonstrated that these models consistently failed to explain 80 to 90 percent of the variation in crime. Even worse, regression models built in the last few years are no better than models built 30 years ago.

But sophisticated experimentalists understood that because of the issue’s high causal density, there would be hidden conditionals to the simple rule that “mandatory-arrest policies will reduce domestic violence.” The only way to unearth these conditionals was to conduct replications of the original experiment under a variety of conditions. Indeed, Sherman’s own analysis of the Minnesota study called for such replications. So researchers replicated the RFT six times in cities across the country. In three of those studies, the test groups exposed to the mandatory-arrest policy again experienced a lower rate of rearrest than the control groups did. But in the other three, the test groups had a higher rearrest rate.

From those 122 criminology experiments, I extracted the 103 that were conducted in the United States and grouped them into 40 “program concepts”: mandatory arrest for domestic violence, intensive probation, and so on. Of these 40 concepts, 22 had more than one trial. Of those 22, only one worked each time it was tested: nuisance abatement, in which the owners of blighted properties were encouraged to clean them up. And even nuisance abatement underwent only two trials.

So what do we know, based on this series of experiments, about reducing crime? First, that most promising ideas have not been shown to work reliably. Second, that nuisance abatement—which is at the core of what is often called “Broken Windows” policing—tentatively appears to work. Even that conclusion needs qualification: it’s a safe bet that there is some jurisdiction in the United States where even Broken Windows would fail. We must remain open to the iconoclast who will find the limits of our conclusions—just as the hard sciences always devote some resources to those who try to unseat conventional wisdom. That is, experimentation does not create absolute knowledge but rather changes both the burden and the standard of proof for those who disagree with its findings.

What businesses have figured out is that they can deal with the problem of causal density by scaling up the testing process. Run enough tests, and you can find predictive rules that are sufficiently nuanced to be of practical use in the very complex environment of real-world human decision making. This approach places great emphasis on executing many fast, cheap tests in rapid succession, rather than big, onetime “moon shots.” It’s something like the replacement of craft work by mass production. The crucial step was to lower the cost and time of each test, which doesn’t simply make the process more efficient but, by allowing many more test iterations, leads to faster and more useful learning.

First, few programs can be shown to work in properly randomized and replicated trials. Despite complex and impressive-sounding empirical arguments by advocates and analysts, we should be very skeptical of claims for the effectiveness of new, counterintuitive programs and policies, and we should be reluctant to trump the trial-and-error process of social evolution in matters of economics or social policy.

Second, within this universe of programs that are far more likely to fail than succeed, programs that try to change people are even more likely to fail than those that try to change incentives.

And third, there is no magic. Those rare programs that do work usually lead to improvements that are quite modest, compared with the size of the problems they are meant to address or the dreams of advocates.

It is tempting to argue that we are at the beginning of an experimental revolution in social science that will ultimately lead to unimaginable discoveries. But we should be skeptical of that argument. The experimental revolution is like a huge wave that has lost power as it has moved through topics of increasing complexity. Physics was entirely transformed. Therapeutic biology had higher causal density, but it could often rely on the assumption of uniform biological response to generalize findings reliably from randomized trials. The even higher causal densities in social sciences make generalization from even properly randomized experiments hazardous. It would likely require the reduction of social science to biology to accomplish a true revolution in our understanding of human society—and that remains, as yet, beyond the grasp of science.

At the moment, it is certain that we do not have anything remotely approaching a scientific understanding of human society. And the methods of experimental social science are not close to providing one within the foreseeable future. Science may someday allow us to predict human behavior comprehensively and reliably. Until then, we need to keep stumbling forward with trial-and-error learning as best we can.

What Social Science Does—and Doesn’t—Know, Jim Manzi, 2010,