Sunday 23 November 2014

Probability of a Goal from Expected Goal Data

This is a post to share a thing that might be a new way of using expected goal data. Rather than adding up the individual shot probabilities to get a summed total (a.k.a. Expected Goals) we can use the xG values of each individual shot in a match to work out the probability of a team scoring in that game.

This uses the idea of shot quality or chance quality, and that less quantity (shots) and more quality (per shot, aka xG) is a good thing. Sometimes less is more. I dabbled with this idea about a year ago in this post about Manchester United's 2012/13 season. In this I presupposed that United's exceptional shot quality that season was a possible explanation for them scoring considerably more goals than their expected goal total alone would suggest. Presuppose. Possibly. Yeah. The method was a bit of a fudge and from further testing/dabbling didn't go anywhere.

Early this year Mark Taylor took a much better swipe at the idea with his post Twelve Shots Good, Two Shots Better, where he demonstrated by simulation that Team A attempting 2 big chances on goal (xG = 2 * 0.60) would beat Team B who took 12 lesser chances (xG = 12 * 0.10). Both teams have the same total goal expectation from the game (1.20) but from his simulation Team A wins out in 37% of the simulated games versus Team B's 32%.

Here's my method for calculating Probability of a Goal (pG) using a team's individual shot/xG data. For each fixture take the xG values for each individual shot and then determine the probability the team do not score at all from any of those shots.Then subtract this from 1.

pG = 1 - [(1-xG1)*(1-xG2) *...* (1-xGn)]

This is the probability that a team will score 1 goal or more in a match having take n shots, with each shot having an individual probability of xGn. 

To the results, or I should say the differences between Expected Goals and the freshly calculated 'probability of a goal', pG. I'm cautious to say "results" as xG has been well tested and proven to be decent, whereas pG hasn't and is new, to me at least, and will need much more testing and stuffs. As you'll see from the chart though there are some significant differences between these two values even though they are determined from the exact same data.

Below is a plot of average xG vs average pG from the first 11 games of this season. The data does not include penalties or own goals. Beneath the chart I'll go over the key "winners and losers" as I see them.

  • Chelsea are the most significant winner. They go from being placed 5th amongst all teams for xG to first for pG. This means they have a best probability of scoring a goal  in any given match. 
  • Arsenal, City and Southampton lose ground to Chelsea. One reason they may be penalised is for being very good against some teams but at the same time not good enough against others, and this perhaps could be the most relevant thing about pG. It's no good hammering one team and padding out your xG total if you then struggle to create equivalent chances in your other games.. 
  • Some teams could also be accused of not having a real cutting edge this season (relative to Chelsea). When big teams face smaller clubs the underdog will naturally and - importantly - deliberately concede possession and defend deep in numbers. "Big Team" will get lots of ball in their opposition's third, and will be afforded lots of chances... but if they overplay or are not incisive enough then they won't be able to create chances of any real quality.
  • Everton and WBA are also big winners here. Everton draw level (horizontally) with the trio of United, West Ham and Spurs which perhaps reflects their actual goalscoring record so far this season. West Brom..  yeah.. dunno. 
To quickly wrap up, I like that pG uses the same data as xG does but takes advantage of a greater amount of information available in the raw data. I shall have to do some more prodding around of previous season's numbers and see where it takes me.