Saturday, December 31, 2011

Breaking the Ice

A couple of days ago, I read a post on Futility Closet about the Nenana Ice Classic. At the beginning of every year, the people of Nenana, Alaska, set a tripod on the frozen Tanana River. They hold a contest to see who can guess the correct date and time that the tripod starts to fall due to the melting ice.

Upon hearing about this, I decided to see if there was a way to statistically predict when the tripod might fall. I found the break up log and started to look at the data.



Since 1917, the ice break has happened sometime between April 20th and May 20th. The average date is May 4th. However, as is pointed out in the Futility Closet post, the National Snow and Ice Data Center record the event to measure climate change, indicating that the date is changing over time.



The graph shows a negative trend, indicating that the date of the ice break has generally been happening earlier in recent years compared to the past. According to the trend line, the estimate for when the ice break should occur has moved forward by a week, from May 9th to May 2nd since 1917. Of course, the R-Squared is rather small, 0.12, meaning that year-to-year changes only account for 12% of the variation in ice break dates.

Of course, one doesn't just win by guessing the right date, they also have to guess the time. I included the time in the previous estimates to get the average dates, so I could just use the time they produce as the estimate. However, I noticed another pattern when looking at just the times.



The ice break rarely happens late at night or early in the morning. It's not even a normal distribution. No ice breaks have ever occurred between 7 and 9 am, but then there have been a lot just after 9. The best explanation is that the ice is most likely to break when the sun is there to warm it up. If the ice doesn't break by sunset, it is unlikely to break that night.

So my prediction for the most statistically probable date and time of the ice break is May 2nd at 2:11pm. Let's see how I do.

Monday, December 26, 2011

Presidential Election Statistics

I've played around with different ways for predicting presidential elections in the past, but recently I started thinking in a different way. One reason for is that I found that when I applied my methods for predicting elections, it produced no more accurate results than just assuming the following election was exactly the same as the previous.

With that in mind, I decided to look at past elections and based on historic swings in the vote, come up with a prediction for the chances a particular state will vote for a particular party. On average, the swing between Republican and Democrat is 8.6%, with a standard deviation of 6.6%. To put it another way, 95% of the time, the vote swing in any individual state is between 0 percentage points and 21.5 percentage points. What this means is that, if one party won a state by more than 21.5% last election, the chances that it will vote for a different party next election is less than 5%.

Based on the 2008 results, I created a map. It's a bit messy, but should be easy enough to figure out. The colors represent the party most likely to win that particular state and darkness of the colors corresponds to the chances that party will win that state assuming the 2012 result is totally random.



Looking at just the darkest states, those at a 99% confidence level, one should notice something. Some of the darkest blue states are really big, but the darkest red states are all rather small (in population at least). Democrats have done very well recently in the most populous states. An election could be won with just the 11 largest states. Of those, Obama won 9 in the last election. Basically, no matter what happens in 2012, Democrats are virtually guaranteed to win 149 electoral votes (out of 270 needed to win) and 3 of the 11 largest states. On the other hand, Republicans are only guaranteed 20 electoral votes. From the states that the Republicans have more than a 90% chance of winning, they only get 76 electoral votes, which is less than California and New York combined. Of the big states, Republicans only have an 84% chance of keeping Texas and a 65% chance of keeping Georgia. Basically, Democrats are going into the 2012 election with a huge built in lead. Also, since Democrats have such a commanding lead in the large states, they need to defend and win fewer states.

Of course, 2008 was a massive victory for the Democrats, so it seems unlikely that they will maintain the same level of support. Therefore the 2012 result won't be completely random, there will most likely be a swing to the right. So let's assume that there is an average 8.6% swing to the right.



Suddenly, a lot of states become solidly Republican, however, they reveal a weakness. Democrats still win 3 big states with 99% confidence while Republicans only win 1 big state. Even though Republicans win 19 states with 99% confidence compared to Democrats' 9 plus DC those 19 states only represent 155 electoral votes compared to 142 for the Democrats. Basically, even assuming that every state swings to the right, Republicans aren't guaranteed a win as the Democrats have too much solid support from California, Illinois, and New York.

In order for Republicans to have a solid lead over the Democrats in the electoral college, there would have to be a uniform swing to the right in excess of 9%. What's intriguing about that, is that with a 9% national swing, the Republican candidate would win 55% of the popular vote. This potentially means that in 2012, if the Republican candidate wins between 50% and 55% of the popular vote, they risk losing the electoral college.

Tuesday, December 20, 2011

Republican Flavor of the Month

Much has already been said about Republicans' brief love affairs with various candidates. First there was Romney, then anybody but Romney, which led to Bachmann, Perry, Cain, Gingrich, and now perhaps the greatest dark horse of all, Paul.

What gets me is the timeliness in the surge and sudden fall in support for each of the candidates in the polls.





Nationally, Romney led every national poll from the beginning of the year to August 9th.
Perry led from August 15th to September 25th.
Cain and Romney fought from September 25th to November 11th.
Gingrich has led since November 11th.

Broken down, that is 42 days for Perry, 48 days for Cain, and so far 39 days for Gingrich and things aren't looking good for him.

Iowa is even more interesting.
Similarly, Romney led early, but only until June 22nd.
Bachmann led from June 26th to August 4th.
Perry led from August 19th to August 31st.
There were no polls in September.
Cain led from October 7th to November 13th.
Gingrich led from November 15th to December 12th.
Paul has lead since December 18th.

That's 40 days for Bachmann, 13 days for Perry (followed by a 37 day gap in polling), 38 days for Cain, and 28 days for Gingrich.

The turnover rate for candidates appears to be just over 40 days nationally, while Iowa voters appear to be a bit more fickle. They managed to put Bachmann and Paul on top, which hasn't happened nationally, at least not yet. From December 18th to January 3rd, the date of the Iowa Caucus, is only 17 days. If the trend holds, this will be right around the apex of Paul's support.

What is interesting to note is that according the RCP average (the average of recent polls according to RealClearPolitics), Romney has stayed in first or second position nationally and only rarely fallen to third in Iowa. He often shows up as first in the polls during the transition from one fad candidate to the next. Relative to all the other candidates, his support has remained remarkably stable. This could be good or bad. He has been able to maintain steady support unlike other candidates, which will likely carry him through the primary. However, it appears he has very little soft support. Given how things have been going, it seems that Romney will second place his way to the Republican nomination.