## Thursday, March 17, 2011

### Loose Ends - Part I: Predicting Future Success

My plan is to put out a series of posts - hopefully all within the next while - that relate to subjects that I've posted on previously. The object of these posts is to address certain outstanding issues that weren't resolved when I tackled these subjects the first time around.

The first post in the series is an extension of a post that I published last month that looked at how various shot metrics - all of them calculated at even strength with the score tied - predicted future success at the team level.

One related issue that wasn't explored is how well those same shot metrics predict future success when compared to more conventional measures of team strength, such as winning percentage and goal ratio.* This question is actually more fundamental than the one investigated in the original post. After all, if shot metrics like Fenwick and Corsi failed to predict future success better than the conventional measures, then that would render them considerably less useful.

The method employed** was similar to the one used in the first post. Because of the relative complexity of the process, including a step-by-step description may be helpful.

Firstly, I randomly selected a certain number of games from each team's schedule, with each team having an equal number of home and road games selected.

Secondly, I calculated how each team performed over those games with respect to certain variables. The variables that were calculated were even strength Corsi with the score tied, overall goal ratio (with empty net and shootout goals excluded), and winning percentage. Winning percentage was defined as WINS/(WINS+LOSSES). Games that ended in a shootout were considered ties, and were therefore not included in the calculation.

I then randomly selected a second, independent group of games. That is, if a game was included in the first grouping, it was not eligible for selection in the second grouping. As with the first grouping, an equal number of home and road games were selected for each team.

I then determined how each team did in terms of winning percentage over this second group of games, and looked at how each of the three variables calculated in relation to the first group correlated with winning percentage in the second group.

The relationship between the size of the two groups can be expressed as y=(80-x), where x represents the number of games included in the first group, and y the number of games in the second group. So, for example, if 20 games were selected for the first group, the second group would consist of 60 games. Ultimately, I elected to use x values of 20, 30, 40, 50, 60 and 70.

The raw data used was from the 2007-08, 2008-09 and 2009-10 regular seasons. The table included below shows the results for each individual season, as well as the average results. The values represent the average correlation over 1000 calculations.

A couple points:

- Corsi Tied is the best predictor of how a team will perform over the remainder of its schedule, regardless of the point in the schedule at which the calculation occurs.

- Corsi Tied is only marginally more predictive of future success than goal ratio or winning percentage when looking at samples of 60 games or more. In other words, as the sample size becomes increasingly large, there are diminishing returns with respect to the predictive advantage of Corsi. By the end of the season, all three variables seem to predict future success equally well

- The above fact has implications in terms of determining playoff probabilities at the team level, with the results suggesting that a composite metric would work best

- The aggregate values for Goal Ratio and Winning Percentage are remarkably similar. The implication is that once shootout results are controlled for, winning percentage is as good of a measure of a team as goal ratio is

Next up: Score Effects and Minor Penalties.

*Some readers may have observed that the split-half reliability of goal ratio (0.417) was lower than the predictive validity co-efficients for both Corsi Tied (0.444) and Fenwick Tied (0.429). The implication is this is that the two latter variables are better able to predict goal ratio from one half of the schedule to the other than goal ratio is itself.

** I should note that this method was actually developed and first used by Vic Ferrari. See here.

Addendum

Scott Reynolds had a question in the comments section on how the results would differ if we looked at future EV performance rather than overall performance. Using the same method as the one described above, I looked at which of EV Corsi Tied and EV goal ratio (empty netters removed) was better able to predict future performance at even strength (which I operationalized as future EV goal ratio). Here are the results:

The results aren't too different - Corsi Tied is a much better predictor early in the schedule, but the two measures have about the same predictive power by the end of the year.

.

.

.

Subscribe to:
Post Comments (Atom)

## 8 comments:

Good post! If I understand you correctly, the Corsi Tied metric is including only EV results, whereas the goal differential metric is including all goals, both at EV and on ST. It would be interesting to see whether using EV GD would have any impact on the results.

The other thing that stands out is that these correlations aren't as strong as I had expected (although where that expectation came from, I'm not quite sure).

Thanks Scott.

"If I understand you correctly, the Corsi Tied metric is including only EV results, whereas the goal differential metric is including all goals, both at EV and on ST."That's correct.

"It would be interesting to see whether using EV GD would have any impact on the results."I actually ran those numbers as well. In particular, I looked at which of EV goal ratio and EV Corsi Tied was better able to predict future even strength outscoring.

I'll post an addendum.

"The other thing that stands out is that these correlations aren't as strong as I had expected (although where that expectation came from, I'm not quite sure)."Yeah, predicting future performance is difficult.

Early in the season, the sample over which future performance can be assessed is large. However, differentiating between the good and bad teams is hard.

By the end of the year, the reverse is true - we're much better able to distinguish the good teams from the bad ones, but predicting future performance is still difficult because there's just not a lot of hockey to be played at that point.

Thanks for posting the addendum. It's very surprising to me that the EV Corsi results have a stronger correlation to overall performance than they do to strictly EV performance. Did not expect that at all.

Corsi Tied is only marginally more predictive of future success than goal ratio or winning percentage when looking at samples of 60 games or more. In other words, as the sample size becomes increasingly large, there are diminishing returns with respect to the predictive advantage of Corsi. By the end of the season, all three variables seem to predict future success equally well.Unless I'm misunderstanding something, isn't the more likely explanation that the future W% becomes more dispersed relative to talent when the sample decreases in size? The correlation for Corsi falls as you increase the sample for Corsi but decrease the sample for future winning percentage.

So instead of larger samples of W% and EV GD being better predictors (their correlation coefficients decrease as the sample gets bigger) it's that future results become more random with smaller samples?

Scott:

While the result may seem counterintuitive, two things have to be considered.

1. The lower reliability of EV goal ratio, relative to overall goal ratio.

2. The fact that even strength ability and special teams ability are correlated skills at the team level.

In fact, in relation to the first factor, this limitation can be accounted for by determining the true correlation between even strength outshooting (as measured by EV Tied shot ratio) and even strength outscoring (as measured by EV goal ratio).

Using the first five post-lockout seasons as our sample, that correlation would be approximately 0.87. By comparison, the true correlation between EV Tied shot ratio and overall goal ratio is approximately 0.80 (as calculated in this post).

So it appears that even strength outshooting is more closely tied to even strength outscoring than overall outscoring, once the imperfect reliability of the involved variables is accounted for.

Michael:

I see what you're saying.

You're absolutely correct that future results become more random as the sample size decreases.

However, this can't explain how the predictive power of the variables changes relative to each one another.

This can be illustrated by randomly selecting 20 games for each team, looking at each of three variables over that sample, and then determining how each variable predicts future winning over a second, independently selected 20 game sample.

We can then compare those results to how well each of the variables predicted future results in the "60=>20" grouping.

If EV GD and W%

do notbecome better predictors of future winning (relative to EV Corsi Tied) as the sample size increases, then the results in the "20=>20" grouping should roughly match those in the "60=>20" grouping.If, however, EV GD and W%

dobecome better predictors (again, relative to EV Corsi Tied) as sample size increases, then EV Corsi Tied should have more predictor power, relative to the other two variables, in the "20=>20" grouping than in the "60=>20" grouping.Using the 2009-10 season as our sample, I got the following:

60=>20 (from table in post)

W%: 0.366

GR: 0.358

Corsi T: 0.346

20=>20

W%: 0.147

GR: 0.177

Corsi T: 0.275

Corsi Tied has much more relative predictive power in the second grouping than in the first, indicating that it's not just a matter of results becoming more variable as the sample size decreases. Rather, goal ratio and winning percentage become relatively stronger predictors as the season moves forward and the amount of information we have about each team grows.

I have to admit that I really like to make predictions and the best thing about it. It is to be accurate.

Hey. I just want to add my 2 cents about sample size and prediction quality.

-Smaller training set favors Corsi over W% and GR

-Larger training set reduces the differences among metrics

-Smaller test set reduces the accuracy of all predictions

Taking these observations together, I think averaging out randomness is the most important factor here. Corsi data has more hidden samples than goals or wins. Games occur on a time-scale of days, goals on a time-scale of minutes, and possession-changes on a time-scale of seconds. So you can say "I averaged 20 Corsi values and 20 GR values", but the values you started with were already averages.

As you increase the size of the training set, you increase the number of hidden samples for all of the metrics, which seems to reduce the differences among them.

So if you want to know which metric is "best", it depends on your situation.

If you need to train your prediction on a small number of games, because a season just started, or a line just changed, then go with Corsi.

If you have an entire season of data to work with, then it would seem that GR and W% are just as good or maybe better.

I think you could get a better answer with an n-fold cross-validation. That means you select the training and test sets randomly many times and average the results. This lets you use a small test-set size and still get reliable results. For instance, you could train on 75 games and test of 5. Then reselect the training and test sets 10 times and average the results. I suspect you will see W% and/or GR surpass Corsi at some point.

If I'm right, it would mean that wins and goals are actually better predictors of future success than possession, but it takes a long time to get enough data to use them.

Post a Comment