Saturday, October 23, 2010

Corsi Corrected for Schedule Difficulty

While another year of hockey is finally underway, the 2010-11 season is still very much in its infancy. The schedule has yet to reach the 100 game mark, with no team having played more than a handful of games. This being the case, drawing conclusions on the basis of the results thus far can be difficult. The sample size with which we have to work just isn’t large enough.

To illustrate this, consider the league standings as of Friday, October 22nd, following the completion of the 97th game. The range in standing points is 7, with a standard deviation of 2.08. In assigning each team the same winning percentage, setting home advantage at 5%, giving each game a 22% chance of going past regulation, and simulating the first 97 games of the schedule 1000 times under these conditions, the following is obtained.

In other words, virtually all of the team-to-team variation in standings points at this stage of the year is the product of randomness. (For interest’s sake, only about half of the variation in standings points over the course of an entire season can be accounted for by luck.)

As shots in hockey are relatively frequent events, it makes much more sense to rely on a shots-based metric in order to get a sense of how each team has performed thus far. But which metric in particular ought to be used? And which adjustments, if any, are necessary?

Corsi – which includes all attempted shots and therefore better attenuates any sample size concerns – serves as the best fit for this exercise, as opposed to either Fenwick or shot ratio proper. However, two adjustments are necessary. Firstly, playing to the score effects – which are known to bias the shot clock in favour of the trailing team – ought to be controlled for as much as possible. This is especially true this early in the season, as some teams will have played with the lead for much longer periods than others. Ideally, one would restrict the sample to shots attempted at even strength with the score tied in order to get around this problem. However, because of the sample size concern identified above, using Corsi with the “score close” – defined as whenever the score is within one goal in the first or second period, or tied in the third period or overtime – is to be preferred.

Secondly, a team’s Corsi depends not only on its own ability to outshoot the opposition at even strength, but also on the ability of its opponent in this respect. At this point in the year, few teams have played what could be reasonably described as a balanced schedule. Thus, regard should be had to the fact that some teams have faced stronger or weaker opponents with respect to Corsi through incorporating some sort of correction for strength of schedule.

The table below shows each team’s Corsi percentage with the score close and how that changes once an adjustment for strength of schedule is applied.

It’s important to note that, at this point in the year, roughly 43% the team-to-team variation in Corsi percentage with the score tied (raw, not adjusted) can be attributed to luck. Accordingly, some teams will see their ranking change significantly between now and the season’s end. If forced to predict, I’d wager that, relative to underlying talent, the Devils, Bruins, Capitals, Blackhawks and Sharks are better than these rankings suggest. Conversely, I’d wager that the Avalanche, Canadiens, Rangers, Panthers and Flyers are worse.

Anonymous said...

where do you get your data? I am looking for all game outcomes from the 09-10 season. Any ideas? Great blog btw!

JLikens said...

Thanks.

I scraped it from the play-by-play game sheets on NHL.com using an excel web query.

I then wrote a program on excel to obtain each team's corsi with the score close. And then another program to correct for schedule difficulty.

I have some individual game data from the 09-10 season (basically goals, shots, missed shots and blocked shots, broken down by game situation (EV, PP, SH), all at the team level).

I'm not sure if that's what you're looking for, but I'd be happy to send you the file if you're interested.

