Tuesday, November 23, 2010

In Defence of Outshooting

David Johnson recently put up a post at his blog that examined the relationship between shot volume, shooting percentage and goals scored at even strength. Specifically, he determined each team's number of Fenwick shots (shots + missed shots) as well as their "Capitalization Ability" (goals scored/Fenwick shots) over the last three regular seasons and how each variable correlated with goals scored over that same period. He then repeated the exercise with respect to Fenwick shots against, save percentage (goals against/ Fenwick shots against) and goal prevention. After presenting his findings, the following conclusion was drawn:
"The conclusion we can draw from these four charts is when it comes to scoring goals, having the ability to capitalize on opportunities (shots) is far more important than having the ability to generate opportunities (getting shots). Controlling the play and generating shots does not mean you’ll score goals (just ask any Maple Leaf fan), having the talent to capitalize on those opportunities is what matters most. From my perspective, this means the usefulness of ‘Corsi Analysis’ to be minimal, at least for the purpose of evaluating players and teams."
At first glance, Johnson's findings and conclusion seem sound enough. For example, if we determine each team's Fenwick differential and Fenwick PDO (Fenwick SH% + Fenwick SV%) over the last three seasons, and look at the correlation of each with each team's even strength goal differential over that same timeframe, the following values are obtained.*

*empty netters were removed from the sample

The results appear to support Johnson's conclusion. The average correlation between outshooting ability (as measured by Fenwick differential) and goal differential is weaker than the average correlation between [shooting + save percentage] and goal differential. As Johnson might put it, the ability capitalize on opportunities and preventing the opposition from doing likewise is more important than having the ability to generate opportunities.

However, Johnson's analysis suffers in that he fails to consider the impact of random variation upon the correlations that he adduces as evidence to support his position. For example, suppose that outshooting was the sole determinant of even strength goal differential, with the percentages merely reflecting the favour (or disfavour) of the hockey gods. If a full NHL regular season was played out under such conditions, we would still expect to observe:

a) a less than perfect correlation between outshooting and goal differential
b) a substantial correlation between the percentages and goal differential

This can be illustrated by simulating the last three NHL seasons a sufficiently large number of times and averaging out the results, using the following parameters:
• the number of shots taken by a team in any given simulation was the number of Fenwick shots taken by that team during the season simulated
• conversely, the number of shots taken against that team corresponded to the number of Fenwick shots against it conceded during that year
• each team's probability of scoring a goal on any particular shot was the league average Fenwick shooting percentage in that particular season (~5.5%).
• similarly, the probability of conceding a goal on any particular shot was the same for all teams, again corresponding to the league average Fenwick shooting percentage during the season in question
• after each simulation, the correlation between Fenwick differential and even strength goal differential at the team level was determined and recorded, with the same then being done with respect to [Fenwick shooting percentage + Fenwick save percentage] and even strength goal differential
The results:

So, in the three simulated seasons, the average correlation between Fenwick differential and even strength goal differential was 0.73, whereas the average correlation between [Fenwick shooting percentage + Fenwick save percentage] was 0.67. This is significant for two reasons.

Firstly, even in our imaginary world in which the only way for a team to control its goal differential is through generating and limiting shots, the correlation between the percentages and goal differential is effectively as large as the correlation between outshooting and goal differential. This despite the fact that teams have no ability to influence the former.

Secondly, the simulated values (0.73 and 0.67) are comparable to the actual values (0.54 and 0.61), suggesting that the underlying factors that dictate even strength goal differential in the real NHL are not too different from those that prevail in our simulated world. The relationship between the percentages and outscoring is slightly stronger, and the relationship between outshooting and outscoring slightly weaker, but that's to be expected. After all, we know that:

1. There is a skill component to both even strength shooting percentage and even strength save percentage at the team level.
2. Game score (whether a particular team is playing while tied, from behind or while leading) has an effect on both shot differential as well as the percentages. Over the course of a particular season, the amount of time played in each of these goal states at even strength while vary from team to team.
3. Not all teams adopt the same strategy in relation to playing to the score.

The influence of these last two factors cannot be overstated. For example, if we repeat the above exercise, but use only data from when the score was tied at even strength, the actual results are essentially indistinguishable from the simulated results.

Therefore, the fact that there exists a strong relationship between the percentages and even strength goal differential over the course of a single regular season does not in any way negate the utility of Fenwick, Corsi or even strength shot differential as a measure of a team's ability level. Results at the NHL level are strongly subject to the influence of random variation, even over what might seem like a long period of time (i.e. a single NHL season). Losing sight of this fact - or ignoring it to begin with - can only lead to misguided analysis and flawed conclusions.
.
.

RyanV said...

I think another major criticism of his work is that he basically just showed that past shooting percentage is a good predictor of past goals for. Which shouldn't surprise anyone. What we want to find out, though, is whether past shooting percentage is a good predictor of future goals for. You can do that by looking at shooting percentage in the first half of a season and goals for in the second half, or even games versus odd games, or randomly split the games up however you like.

Vic's done all this, of course, and found that corsi/fenwick/shot differential are all much better predictors of future success than shooting percentage.

JLikens said...

Yeah, that's another angle that could be taken.

IIRC, the predictive validity of 1st half Fenwick differential in relation to EV goal differential over the remainder of the schedule is on the order of 0.3 - 0.4.

That may not seem like much, but it has to be considered in context. PDO over the 1st half of the schedule has very little repeat to the 2nd half , and its predictive validity is essentially nil.

Well done. I'm actually surprised you bothered to put up
a refutation of this; as Ryan points out, doing the in-sample
correlation is a major flaw and discredits pretty much anything
else.

One danger of measuring only ES score-close is that it
reduces the sample size, which reduces the predictive value of
percentages but not of Fenwick. Percentages are more important
than ES-close analysis implies, but your findings hold
regardless.

JLikens said...

Thanks Tom.

I decided to post this in order to illustrate that a simple r^2 analysis, without more, can often lead to the wrong conclusions.

In terms of your other point, I assume you're referring to the second part of my post, where I used EV-tied data (the first part of the analysis used all even strength data, regardless of game score).

I agree that sample size can be a relevant consideration when using only data with the score tied.

For example, in conducting a coin-flipping experiment to determine the percentage of variance attributable to luck, one will invariably find that the percentage increases as the sample size decreases. So comparing the % of variance attributable to luck with respect to EV SH% over the course of a single season with the % of variance attributable to luck in relation to EV SH% with the score tied over the same period of time is, to some extent, comparing apples to oranges - the latter figure will be lower than the former, but part of that is the result of the reduced sample size.

I'm not sure if that concern arises here, however, due to the nature of the simulation conducted (i.e. comparing the observed correlation to the actual correlation).

If I had to guess, I would say that EV outshooting truly is more important ( in terms of the extent to which it is indicative of team talent) when the score is tied, and that its elevated importance reflects a qualitative difference between the way that teams play with the score tied and the way that they play otherwise.

he did a lot of research to make that kind of post. there are many variables to account for.