I'm not a pollster, nor a statistician. Perhaps like you, I am quite bemused by the crazy and vast differences we see in all the various election polls - though I note that in the last few days a lot of these polls have tightened, showing Obama's lead shrinking.
I'm a huge fan of RealClearPolitics. They produce, among lots of other great things, an important number that many in the media follow - the RCP national average of polls - a very helpful number, and in theory a great idea: average all of the best polls so as to arrive at a usable consensus number. One number to rule them all.
But I have a problem with their math, at least their statistical sampling.
It started with wondering why their national poll average has not come down more in the face of most of the major polls reporting a smaller Obama lead - from, say, 9% to 3%. This is a huge drop for Obama, but RCP's average has not come down much, nor has their electoral map changed really in the last few weeks - currently, it looks like an Obama blow-out.
But in looking at their source data for their average, one can see that they use 10 polls. The problem is that they are including a "statistical outlier". The data from Pew Research, as listed on RCP's site, shows Obama with a 15-point advantage. That's about twice the next highest poll, and 5 times the size of a few others in the formula.
The WaPo has a timely article out that discusses the Pew poll:
A Pew Research Center poll released yesterday shows a 15-point lead for Obama, a result based on relaxed criteria for when to consider an African American respondent a likely voter, said Andrew Kohut, president of the center. He said the poll shows that roughly 12 percent of the electorate this year is black, up from 2004, with a similar increase among younger voters. Kohut defended this approach, saying there are historically high levels of interest in this contest among both demographic groups. At the same time, he added, "we've consistently shown less enthusiasm and engagement among Republicans than is typical, and the composition of the electorate shows that."So the Pew poll assumes that the makeup of this year's electorate will be very different from 2004 due to less Republican turn out and much more turnout among the young and African Americans.
So, should the Pew poll even be included in a national average? I'm not accusing RCP of any bias, I'm just saying you get vastly different national poll numbers when that Pew poll is excluded.
In doing the math myself, I find that the national average, including Pew, shows a 5.9% lead for Obama. 5.9% sure looks a lot smaller than 6%, but RCP reports it as "6.0%".
Now, excluding the Pew poll (which, again, is way higher than the others in their formula), I get 4.9%.
4.9% is certainly a lot smaller of an advantage for Obama than 6%, especially when one considers that the average margin of error could be 3%. Meaning, Obama's "6%" lead could really be only 1.9%. And that's pretty close to a dead heat race.
(If anyone from RCP is reading this, I'd appreciate your comments and any clarification you can provide for your formula.)
If you like this article, click the buzz button below.