This campaign season we’ll hear a lot about “statistical ties.” The “statistical tie” misnomer is used to refer to the situation where one candidate leads another candidate but that lead is within the margin of error (MOE). However, what we’re really interested in is the probability that one candidate leads the other candidate.

Since polling all voters is a costly and time consuming process, a random sample of voters is selected and, based on some assumptions, one can make a probabilistic judgement regarding the outcome of an election. Polling all voters yields the “true” percentage while the random sample can only estimate the “true” value.

Every time a sample is taken, a different (perhaps) estimate of the “true” value is obtained. The estimate plus and minus the MOE is called the confidence interval. A 95% confidence interval says that 95% of the sample estimates will lie within that interval. Also, a 95% confidence interval says that we are 95% certain that the “true” value lies within that interval.

Consider two candidates, Smith and Jones. We want to know what percentage of the voters prefer Smith and what percentage prefer Jones. The “true” percentage is unknown so we randomly sample the population to obtain an estimate. We want to be 95% confident that our estimate lies within the “true” percentage plus-or-minus three points. The figure at the right determines the sample size for different MOEs. Sample sizes usually assume an infinite population and a correction factor is only required when dealing with very small populations.

After our poll of 1067 likely voters, Smith leads Jones 49-46% with an MOE of 3%. Some would call this a “statistical tie” since the lead is within the MOE. However, the following table tells us that Smith’s lead is 84% probable.

H/T: Kevin Drum, Fritz Scheuren.

Tony C. 1, October 7, 2012 at 3:30 pm

What exactly would you propose as an actuarial approach to predicting the outcome of the election that does not involve polling people’s opinions of the candidates or their policies?

@Matt: Who are you yelling at? If it is me, a friend of mine from college became an actuary for an insurance company; he has no problem at all with political polling, and reads Nate Silver’s analyses regularly.

What exactly would you propose as an actuarial approach to predicting the outcome of the election that does not involve polling people’s opinions of the candidates or their policies?

@mahtso: The selection is random; typically it is literally random numbers being dialed until somebody answers.

A valid reason for weighting is that one can discover, from previous polls, that for whatever reason a definable segment of people do not participate as much in the polling as they do at the voting booth.

For example, some segment may screen calls more than others, or may have only cell phones.

So if you find out from past elections that your polling method under-selects Democrats and over-selects Republicans, you can find a weighting for the people you DO choose that brings the polling for those past elections as closely as possible into line with the actual outcomes; and apply that to the current polling. So you might increase the weight of the few young respondents you have, decrease the weight the elderly that are more likely to have land lines and free time to participate, increase the weight of the few working mothers that found the time to respond, and so forth.

Unless the pollster is corrupt the weighting is not done in the interest of biasing the outcome, it is done in the interest of using past polling outcomes and experience to better predict the outcome.

Built into those weights will be the implicit adjustment for people that say they intend to vote but do not, and vice versa.

“Since polling all voters is a costly and time consuming process, a random sample of voters is selected and, based on some assumptions, one can make a probabilistic judgement regarding the outcome of an election. Polling all voters yields the “true” percentage while the random sample can only estimate the “true” value.”

Are the samples actually random? I’ve seen reports that some of the recent polls have taken 10% more of one party than the other, which is not likely to happen in the presidential election.

For polls before the election there are no “voters.” The statistics are not necessarily valid for reasons including that people may say they intend to vote, but then do not.

You also have the fact that the question is repeated so often, that one can combine polls.

The number of samples required to meet a particular MOE is very close to 1/(MOE*MOE); so for example, to get to 2%:

1/(.02 * .02) = 1/(.0004) = 2500. (2401 is the more accurate number, but this is close enough).

Thus, if you are given the MOE, you can compute how many people were in the poll, and from the results compute how many said Romney, and how many said Obama. Then you can add up polls and have a “larger” poll, with a much smaller margin of error (which is computed as approx 1/ sqrt(N), where N is the total number of people surveyed.)

More intuitively: After you see nine polls with basically the same result, the MOE is divided by 3.