A history of randomization

Much of my current research involves randomized control trials (RCTs), in which participants or clusters of participants are randomized into a treatment group (or groups) and a control group. The treatment group gets some sort of intervention and the control group does not. If the sample size is large enough, randomization ensures that each group is ex ante identical and that therefore any difference in outcomes can be causally attributed to the intervention and not e.g. to selection bias or existing trends.

A couple of colleagues and I started talking about the historical origins of randomization in science. Most people associate this approach with clinical trials in medicine and epidemiology, which is indeed where they gained fame, but I was pretty sure that they had earlier and first been used in agriculture by the pioneering statistician and biologist Ronald Fisher. I was half right: Fisher did use them earlier, and his text on experimental design is a classic treatise laying out the principles of randomization, but he was not first.

Apparently the first reference to comparing treatment and control groups is in the Book of Daniel in the bible, where King Nebuchadnezzar orders his people to eat only meat and drink only wine (for their health). When some object, he allows them to eat only legumes and drink only water for 10 days, after which he compares their health to the “control” group of meat-eaters. They do better, so he allows them to continue their diet. Note, however, the lack of randomization and hence the possibility of a strong selection effect. The king appears to implicitly understand this, since he does not order everyone to switch to the new diet.

That story is told in this interesting medical history of clinical trials (as well as this follow-up article). It also discusses the famous streptomycin (for tuberculosis) trial of 1946 (published 1948), which is generally considered the first random trial in medicine. Indeed the wikipedia article on RCTs calls it the first published RCT, although as we shall see this is essentially incorrect. One of the medical histories bizarrely states, without citation, “The idea of randomization was introduced in 1923.” This is wrong by approximately three centuries.

A few years earlier, during the war, the british Medical Research Council performed a trial of patulin (related to penicillin) for the common cold. They didn’t explicitly randomize; instead they alternated patients to treatment vs control. That strikes me as a generally valid approach, and they also deserve a lot of credit for carefully double-blinding the study. Apparently there was also an early randomized experiment to test a potential immunization against whooping cough. But neither of these interventions proved efficacious, and the streptomycin trial was published first, so it takes the credit (within medicine).

Meanwhile the medical literature points back to Dr. James Lind, an 18th-century physician who decided to test various methods for treating scurvy while serving as a naval ship’s surgeon in 1747. He took 12 diseased sailors, split them into groups of two, and gave each of the six groups a different treatment (cider, sea water, nutmeg, etc). After 6 days one group showed marked improvement: those who had received oranges and lemons. Formally this wasn’t randomized, and the sample size was very small, but this is probably the first rigorous experiment along such lines.

However, the idea of randomization dates back to 17th-century belgian physician Van Helmont. As everyone knew at the time, bloodletting was a great cure for most ailments. Van Helmont agreed, but he thought that evacuation (i.e. inducing vomiting and defecation) was even better. To settle the argument, he proposed:

Let us take out of the Hospitals, out of the Camps, or from elsewhere, 200 or 500 poor People, that have Fevers, Pleurisies, etc. Let us divide them in halfes, let us cast lots, that one half of them may fall to my share, and the other to yours; I will cure them without bloodletting… we shall see how many Funerals both of us shall have.

Bingo!  Of course they were all completely misguided, and for better or worse the experiment never actually happened, but this is a perfect description of an RCT.

This reference has appeared in the economics literature, where it looks like it was taken from this paper (published in BMJ). Indeed almost all the references to Van Helmont in a google search seem to be derived from that paper, since they all give a date of 1662 (as it does) for the original idea. Van Helmont, however, died in 1644. I took the quote above from an article I happened to see in Natural History (June 2013) written by Druin Burch, who cites it as being from a collection of Van Helmont’s writings published posthumously by his son in 1648. Burch is a doctor who teaches at oxford and has written a history of medicine.

So where are we? Van Helmont is the first person known to have realized that the optimal way to compare two remedies is via randomization, sometime before 1644. Lind seems to have performed the first actual comparative clinical test, with important conceptual and practical consequences, albeit not strictly an RCT. Later we have the streptomycin paper that sets randomization as the “gold standard” in medicine. But the first proper scientific RCT appears to have come from psychology.

This nice review article describes some of the gradual evolution of randomized comparison groups in social science, beginning with psycho-physics and moving on to the psychology of education and elsewhere. There is no crisp, clean revelation of the benefits of randomization (and indeed the idea was centuries old by then), but rather a gradual sense that this was the right way to do science and to learn about the world.

If one had to specify a particular article, a paper on perceived sensation by Peirce and Jastrow (published 1885) may be the first to explicitly randomize. They do it trial-by-trial, rather than into a single large treatment and control group, although of course one can simply define the treatment group as all trials under a given condition. [Even if this doesn’t meet your preferred definition of an RCT, later work in the psychology of education that also predates Fisher would, so social science takes the prize.] The randomization was done by means of well-shuffled playing cards, with the color red or black determining the category of trial. One reason for the randomization was that the experiment was only single-blinded, so they write that through their technique

…any possible psychological guessing of what change the operator was likely to select was avoided. A slight disadvantage in this mode of proceeding arises from the long runs of one particular kind of change, which would occasionally be produced by chance and would tend to confuse the mind of the subject. But it seems clear that this disadvantage was less than that which would have been occasioned by his knowing that there would be no such long runs if any means had been taken to prevent them.

I love it!  Exactly the type of thing we worry about now as well; for instance see footnote 11 in this paper of mine.

In any case I welcome reader comments and additions to this history, which I’m sure is incomplete. I found the abstract of another history of clinical research, but the article seems to be in hebrew. It discusses some of the above (with the wrong century for Lind), though not the work in psychology. They mention work by the french physiologist (not psychologist) Claude Bernard, who I note was born in Saint-Julien. His 1865 discourse on the study of experimental medicine justly highlighted the role of science (hypotheses, experiments, statistics) in the future development of medicine and public health. But as far as I can tell there is no mention of randomization…

[update 4/21: Helpful readers pointed out two additional articles in the medical literature, a more complete description and interpretation of the biblical story as well as more details on the early trial of patulin. Some also suggested the story of Gideon and the fleece (from the book of judges, apparently predating Daniel by 500 years) as the very first biblical reference, but I’m not convinced. Gideon tests god’s will by asking him to put dew on a fleece one night, and to put dew around the fleece the next night. Perhaps formally that constitutes a within-subject (note the singular!) test, but the Hawthorne “experimenter demand” effect is so blatant as to render it almost meaningless.]

Category: Psychology, Research, Science

10 Responses

  1. Lior Pachter says:

    Allopatric speciation after vicariance provides ancient (millions of years) examples of (natural) randomization trials testing key elements of Darwin’s theory of evolution.

    • Julian Jamison says:

      Lior: excellent point! It seems almost unfair that Nature had done all of Darwin’s hard work for him, carefully testing his theory thousands of times and then giving him the data…

  2. Chris Lysy says:

    “Later we have the streptomycin paper that sets randomization as the “gold standard” in medicine.”

    Literally or figuratively?

    I’ve been trying to find the point at which the RCT was declared the “gold standard.” I’ve been unable to source the connection and was guessing that it happened sometime in the 80s in the field of medicine.

    Any insight?

    The only thing I could find were assumptions. The closest thing I have to a direct connection is to Harry Gold’s Standard (the comparison group in Gold’s early double-blind work), but that just seems like a bizarre coincidence.

    • Julian Jamison says:

      Chris – great question. I used it figuratively (indeed I debated whether to use that phrase in my post at all, since it is almost a cliche at this point) and would also be curious to know when it came into use in this way. I don’t think it was Harry Gold, since the general phrase “gold standard” is a long-standing technical term in economics having to do with the value of money being fixed relative to gold. But I have no idea when it came to be associated with [medical] RCTs.

      • Chris Lysy says:

        Thanks Julian,
        There’s definitely a lot of rhetoric.

        Regarding the “long-standing technical term.” At the time when the early RCT work was being done, the economic “gold standard” seemed to really just refer to gold. I didn’t dip into the economic literature, so I could definitely be missing a lot. I found it didn’t really jump to its new usage until more recent times.

        I don’t think it was Harry Gold either. I couldn’t find evidence of anyone else using the phrase Gold Standard in that way. I was looking into this (RCT as the Gold Standard) from the field of evaluation. The roots seemed to go back to the field of medicine. The Harry Gold thing seems too strange to be true, so I’ve been trying hard to find contradictory evidence that is more than just assumptions. Still looking…

        • Julian Jamison says:

          Yes – I think I’m agreeing with you. The economics term referred only to gold (the metal), not an idealized research approach. But I do think the word “gold” in the current usage was somehow derived from the metal and monetary policy, rather than from Harry Gold. When or how it switched to being about a benchmark I have no idea, but almost certainly not within economics. Let me know if you find out anything more…

          • Chris Lysy says:

            Will do. Thanks for the help.

            I agree with your point too, I think our common understanding of the gold standard is based on the monetary policy.

            What I question is whether or not “Gold Standard” was originally a homonym, at least with respect to RCTs. It’s so strongly connected I expected to find a source easily, I could not.

            Ultimately, in current usage, it doesn’t matter. In my own field debate over the method’s appropriateness continues, with much of it based on the rhetoric.

  3. Brajesh Singh says:

    the maximum number of RCTs are done in India, so all credit as of now must go to Indians

  4. […] year I wrote a blog post about the history of random design after it came up in a couple of different settings. I have now […]

Leave a Reply

Notify me of followup comments via e-mail. You can also subscribe without commenting.

Subscribe to tunes for bears

Name (required)

Email (required)

Julian C. Jamison

I'm an economist, researcher, traveler, runner, and astronaut-in-waiting. I enjoy pondering human behavior, including both what we do and what we ought to do - either to maximize our well-being or in pursuit of some other goal.

More about me
About TFB (this blog)
Contact me
Site policies




The views and opinions expressed on this website are solely those of Julian C. Jamison and other occasional authors, and they do not necessarily represent the views of the Consumer Financial Protection Bureau or the United States.