Thursday 13 March 2014

Testing of Football Stats: Part 2 - "Predictive"

 A couple of days ago I used a correlation matrix to compare a bunch of different underlying football stats and metrics with the top level stuff like goals and points from the same set of games. A correlation matrix has to be considered a rough-and-ready first step to see where the best and worst relationships lie rather than a conclusive test in itself.

Below is the correlation matrix for the same bunch of stats but from one group of 38 games to the next 38. No overlapping . The method I’ve used chosen uses  rolling sets of 38 games rather than a season to season, or half-season to half-season.  E.g. for each team I pair games 1-38  with games 39-76, then pair games 2-39 with 40-77, and so on.  

Click me for a bigger me!
I’ve highlighted just the areas of prime interest, which is mostly Goals For, Against and Points Per Game (PPG).

A quick reminder that my primary objective is to predict goals scored over a short spell of matches for the fantasy football work I do over at InsideFPL although this analysis is so damn interesting who knows where it'll end up.

Starting with Goals For and Against then (GF+ & GA+, 1st and 4th columns), we can see the strongest correlations are with underlying stats rather than with goals.  This is pretty much known already but it’s good to see it like this and my favourite thing about a correlation matrix is how you can get a quick overview of all this.

Expected Goals (xG) beats Expected Goals from SOT (xGT). I mentioned this specifically as in the last post xGT had a stronger correlation with Goals when looking back but here xG appears the better looking forward . This suggests to me that whilst using shots on target will likely to yield more goals it’s not necessarily a sustainable metric.  I should have really included Shot Accuracy in this analysis in the same way I did with other “skill metrics” like Goal Conversion, but alas, next time.

The two ratios I’m fond of right now (and spending far too much time with)  are looking good in the matrix. Again I hasten to add a correlation matrix is just a high-level overview, but hopefully you can see why I've grown fond  But they make sense too. If ‘Expected Goals” is an acceptable measure of a teams ability to create goalscoring chances then expected goals difference (xGD) seems the next logical step. I could probably waffle on for too long about the subjective  difference between xGR (Expected Goals Ratio) and xGD, how one accounts for “Control”, the other “Power”, but so far my testing has proven no significant difference between the two so ‘ll leave it at that for now,

‘Predicted xGD’ was my quick attempt to adjust for opposition  strength in my data. although this shouldn't be significant over 38 matches. It was simply the 38 game average xGD of each team minus xGD of opponent on a match by match basis, and then average this up over 38 games. It’s actually what led my to get carried away and build a ScorePrediction Tool (still in Beta testing  - use with caution!).

This is all looking pretty healthy at this point and confirms what others have done recently (see Michale Caley - What is the Best Method for Predicting Football Matches?).   However, as with all healthy dishes, a pinch of salt is required. Following Michael's example my next step was to take a closer look at some of most promising metrics and perform Error Testing.  The results of that will be my next post,and whilst it does pour some water on theses initial results this actually pointed out some new paths for getting much more up closer with the data and pull it apart some more, rather than looking at it all reduced to single numbers (e.g. why did Chelsea have such big Point to xGD error under Benitez?).

I'm also pretty keen to start looking at Player Impact on xGD, both relative to team and league averages, and start investigating some key player stuff as well as understand some of the variation I'm seeing in team trends.

No comments:

Post a Comment