The distinction between convenience sampling and representative sampling recently came up at work. It turns out to be very important for certain legal and process reasons whether a particular study claims to have used a representative sample. At first blush the distinction seems clear (see below), but after thinking about it harder I decided that it’s not clear at all. As with so many things, there is a continuum involved rather than a binary split — which puts the law in a rather weak position.
A representative sample accurately reflects the overall distribution of the population of interest, often because it is composed of a [uniformly] random subset of that population which is large enough for the LLN to kick in. This is the gold standard, since it allows you to generalize the results of your study as if they applied to the entire population. On the other hand, a convenience sample is drawn from whatever happens to be close at hand, e.g. surveying people at a mall. This is often used for pilot testing, and the results can’t (or at least shouldn’t) be generalized.
Straightforward, right? Well… what if you don’t have a lot of gift cards to hand out, so you randomly survey only every third person at the mall? This is now a nice random sample, perfectly representative of at least the friendly mall-goers in that locale at that time. To which the purist admits that yes, it is indeed representative of some population, but not a relevant population. Whether it is relevant, however, depends on your goals. To put it differently: what population are you interested in?
Many people are interested in the population of americans. For instance, to determine whether a new federal drug policy will differentially affect men and women, you would need to survey a representative sample of americans. No problem: just get out your handy list of every single american, with email and phone number and address, then pick a random subset of them, and make sure you get a 100% response rate. What’s that you say? No such list exists, even at the NSA, and no large survey ever gets a 100% response rate? You are correct.
Everyone who is a government conspiracy theorist, or lives off the grid, or is in jail, or is a spy, or is an undocumented alien, or has dementia, or is under five years old, or who simply doesn’t waste their time doing surveys… isn’t very likely to respond. And that’s if you had the magic roster to begin with. So you end up with a nice representative sample of the following population: americans you could find, of a certain age and legality and temperament and so on. The point is that you will never get a representative sample of all americans (or even all working-age tax-paying americans), and you will always get a representative sample of some population, even with the most casual convenience sample.
Of course this doesn’t imply that either method is equally good for a given purpose, but it does imply that there’s no bright dividing line between them. It also implies that by pretending that “representative” surveys are actually representative, we deprive ourselves of the opportunity to try to correct, however poorly, for some of the many biases listed above. And we are being intellectually lazy by not acknowledging the flaws in the methodology, some of which are highly nontrivial in many contexts of interest. [Think of drug policies and ask yourself whether the impact is likely to be the same, on average, for the kind of people who respond to surveys versus those who…