Apr 20, 2014
Much of my current research involves randomized control trials (RCTs), in which participants or clusters of participants are randomized into a treatment group (or groups) and a control group. The treatment group gets some sort of intervention and the control group does not. If the sample size is large enough, randomization ensures that each group is ex ante identical and that therefore any difference in outcomes can be causally attributed to the intervention and not e.g. to selection bias or existing trends.
A couple of colleagues and I started talking about the historical origins of randomization in science. Most people associate this approach with clinical trials in medicine and epidemiology, which is indeed where they gained fame, but I was pretty sure that they had earlier and first been used in agriculture by the pioneering statistician and biologist Ronald Fisher. I was half right: Fisher did use them earlier, and his text on experimental design is a classic treatise laying out the principles of randomization, but he was not first.
Apparently the first reference to comparing treatment and control groups is in the Book of Daniel in the bible, where King Nebuchadnezzar orders his people to eat only meat and drink only wine (for their health). When some object, he allows them to eat only legumes and drink only water for 10 days, after which he compares their health to the “control” group of meat-eaters. They do better, so he allows them to continue their diet. Note, however, the lack of randomization and hence the possibility of a strong selection effect. The king appears to implicitly understand this, since he does not order everyone to switch to the new diet.
That story is told in this interesting medical history of clinical trials (as well as this follow-up article). It also discusses the famous streptomycin (for tuberculosis) trial of 1946 (published 1948), which is generally considered the first random trial in medicine. Indeed the wikipedia article on RCTs calls it the first published RCT, although as we shall see this is essentially incorrect. One of the medical histories bizarrely states, without citation, “The idea of randomization was introduced in 1923.” This is wrong by approximately three centuries.
A few years earlier, during the war, the british Medical Research Council performed a trial of patulin (related to penicillin) for the common cold. They didn’t explicitly randomize; instead they alternated patients to treatment vs control. That strikes me as a generally valid approach, and they also deserve a lot of credit for carefully double-blinding the study. Apparently there was also an early randomized experiment to test a potential immunization against whooping cough. But neither of these interventions proved efficacious, and the streptomycin trial was published first, so it takes the credit (within medicine).
Meanwhile the medical literature points back to Dr. James Lind, an 18th-century physician who decided to test various methods for treating scurvy while serving as a naval ship’s surgeon in 1747. He took 12 diseased sailors, split them into groups of two, and gave each of the six groups a different treatment (cider, sea water, nutmeg, etc). After 6 days one group showed marked improvement: those who had received oranges and lemons. Formally this wasn’t randomized, and the sample size was very small, but this is probably the first rigorous experiment along such lines.
However, the idea of randomization dates back to 17th-century belgian physician Van Helmont. As everyone knew at the time, bloodletting was a great cure for most ailments. Van Helmont agreed, but he thought that evacuation (i.e. inducing vomiting and defecation) was even better. To settle the argument, he proposed:
Let us take out of the Hospitals, out of the Camps, or from elsewhere, 200 or 500 poor People, that have Fevers, Pleurisies, etc. Let us divide them in halfes, let us cast lots, that one half of them may fall to my share, and the other to yours; I will cure them without bloodletting… we shall see how many Funerals both of us shall have.
Bingo! Of course they were all completely misguided, and for better or worse the experiment never actually happened, but this is a perfect description of an RCT.
This reference has appeared in the economics literature, where it looks like it was taken from this paper (published in BMJ). Indeed almost all the references to Van Helmont in a google search seem to be derived from that paper, since they all give a date of 1662 (as it does) for the original idea. Van Helmont, however, died in 1644. I took the quote above from an article I happened to see in Natural History (June 2013) written by Druin Burch, who cites it as being from a collection of Van Helmont’s writings published posthumously by his son in 1648. Burch is a doctor who teaches at oxford and has written a history of medicine.
So where are we? Van Helmont is the first person known to have realized that the optimal way to compare two remedies is via randomization, sometime before 1644. Lind seems to have performed the first actual comparative clinical test, with important conceptual and practical consequences, albeit not strictly an RCT. Later we have the streptomycin paper that sets randomization as the “gold standard” in medicine. But the first proper scientific RCT appears to have come from psychology.
This nice review article describes some of the gradual evolution of randomized comparison groups in social science, beginning with psycho-physics and moving on to the psychology of education and elsewhere. There is no crisp, clean revelation of the benefits of randomization (and indeed the idea was centuries old by then), but rather a gradual sense that this was the right way to do science and to learn about the world.
If one had to specify a particular article, a paper on perceived sensation by Peirce and Jastrow (published 1885) may be the first to explicitly randomize. They do it trial-by-trial, rather than into a single large treatment and control group, although of course one can simply define the treatment group as all trials under a given condition. [Even if this doesn’t meet your preferred definition of an RCT, later work in the psychology of education that also predates Fisher would, so social science takes the prize.] The randomization was done by means of well-shuffled playing cards, with the color red or black determining the category of trial. One reason for the randomization was that the experiment was only single-blinded, so they write that through their technique
…any possible psychological guessing of what change the operator was likely to select was avoided. A slight disadvantage in this mode of proceeding arises from the long runs of one particular kind of change, which would occasionally be produced by chance and would tend to confuse the mind of the subject. But it seems clear that this disadvantage was less than that which would have been occasioned by his knowing that there would be no such long runs if any means had been taken to prevent them.
I love it! Exactly the type of thing we worry about now as well; for instance see footnote 11 in this paper of mine.
In any case I welcome reader comments and additions to this history, which I’m sure is incomplete. I found the abstract of another history of clinical research, but the article seems to be in hebrew. It discusses some of the above (with the wrong century for Lind), though not the work in psychology. They mention work by the french physiologist (not psychologist) Claude Bernard, who I note was born in Saint-Julien. His 1865 discourse on the study of experimental medicine justly highlighted the role of science (hypotheses, experiments, statistics) in the future development of medicine and public health. But as far as I can tell there is no mention of randomization…[update 4/21: Helpful readers pointed out two additional articles in the medical literature, a more complete description and interpretation of the biblical story as well as more details on the early trial of patulin. Some also suggested the story of Gideon and the fleece (from the book of judges, apparently predating Daniel by 500 years) as the very first biblical reference, but I’m not convinced. Gideon tests god’s will by asking him to put dew on a fleece one night, and to put dew around the fleece the next night. Perhaps formally that constitutes a within-subject (note the singular!) test, but the Hawthorne “experimenter demand” effect is so blatant as to render it almost meaningless.]