Error Margins Explained In Three (±1) Minutes

How is it possible that an opinion poll claims to know what the whole country thinks just by asking a few hundred people?

Ask a hundred people if they like cats and around 63% will say yes, give or take a few¹. If you want to be more accurate, you can even say that the number will be between 53 and 73. What if you asked a thousand people? Perhaps you’ll find 530 to 730 ailurophiles this time?

“Wisdom is the daughter of experience “, according to Leonardo da Vinci, and experiencing more responses to our cat question means we can actually make a wiser guess: with a sample size of one thousand people you should find that 600 to 660 of them like cats.

How this works

First, some terminology. In the first example above, we guessed that 53 to 73 out of 100 people like cats. As a percentage, this is 53% - 73%, or you could also write this as 63% ± 10%. Here, 63% is our estimate and 10% is our margin of error.

Second, a formula. For a sample of size N, the margin of error (as a percentage) is $\frac{98}{\sqrt{N}}$

When we ask 1,000 people, our margin of error is $\frac{98}{\sqrt{1000}}$ = 3%. Combining this with our estimate we get 63% ± 3%, or 600 to 660 people answering yes.

How many people would you need for a 1% margin of error? Working backwards, $N = (\frac{98}{ME})^2$ , so a 1% error margin gives $N = (\frac{98}{1})^2$ = 9,604 people.

This should work most of the time

Let’s imagine you roam the country asking strangers on Tinder if they like cats (statistically, it’s an excellent opening line). Every day you ask 10 strangers, and can expect between 3 and 9 to say yes, but the exact number will change day to day. One day you might get 7, the next day 4, another you might find that all ten people say they like cats.

The formula above calculates a margin of error that’s correct 19 times out of 20, or 95% of the time. This number, also called the 95% confidence level, is baked into the formula so that roughly one day in twenty you’ll find below 3 or above 9 ailurophiles in your Tinder survey.

Perhaps a one in twenty chance of being wrong is too uncertain, but one in a hundred might be acceptable? This means using a 99% confidence level, and the formula becomes $\frac{129}{\sqrt{N}}$ instead. Using this new formula, you’ll find that if you want to conduct your Tinder poll at a 99% confidence level, you can now expect to find between 2 and 10 cat lovers each day.

When it doesn’t work so well

Some towns are dog towns, some towns are cat towns². Sometimes people will give the answer they think you want to hear, sometimes people later change their minds and decide they prefer dinosaurs as pets. These are all examples of sampling bias, which we’ll cover in a future article.

If you want to try this at home and estimate what percentage of people in your area like cats, you can use the margin of error to do this. If you ask 100 people and 80 say yes, then your estimate is 80% with a 10% margin of error (at a 95% confidence level). In other words, there’s a 19 out of 20 chance that the actual percentage is somewhere between 70% to 90%.

There will always be a margin of error in any election poll. This is fine, we can’t poll the whole country, but do remember that the headline figure is not completely accurate. A large sample size will give a smaller margin of error. This can however be gamed by using a lower confidence level, so for reliable results check for both a large sample size and a high confidence level.

63% of US adults like cats either “a little” or “a lot”. Poll conducted Oct 2009 by GfK Roper Public Affairs & Media, involving telephone interviews with 1,967 US adults and has a margin of error of ±2.2%. ↩
In Vermont, 50% of households own a cat, compared with only 25% of households in Utah. Alabama is definitely more of a dog state (44% of households) than a cat state (27% of households). ↩