Monday 19 November 2012

Underlying data driving Goal Conversion

One of the pillars of this site on which a lot the modelling and forecasts are built around is that shot data such as shots on target and shots in the box are normally distributed and therefore can be predicted with decent success.  I wrote about this in my earlier write-up on the F.SCORE metric and recently Soccer By The Numbers did a better job of demonstrating this in a post entitled Normal Football and Skewed Distributions.  

So whilst shot totals are somewhat predictable actual goals aren't.  The rule of thumb is 1 in 3 shots on target ends up as a goal (31%) , as does 1 shot in 9, and indeed 1 shot in 3 is on target.   These values have remained very constant for decades it seems and I originally built my model to use these average values for all teams in determining goals scored.  

Not all teams are equal though, quite obviously.  Some teams can convert shots on target at a much better rate than others.  Last season for example Stoke converted at roughly 40% compared to Liverpool down at 20%.  

However predictive Shots or Shots on Target are though  they don't actually count for anything at the end of the day.  It's goals that count, goals, goals, goals, and understanding what, if anything, drives a team's Goal Conversion has been a obsession of mine for the last month or so.

I remember Roy Hodgson after the England 1-1 France game at Euro 2012 when taking some flak for the amount of shots on target France had compared to England replying that he didn't care much for stats such as that and it was more important that they didn't concede shots that could realistically have scored (although the goal they did concede was struck from outside the box and through someones legs if I recall).

But this highlights something that has to be considered, that not all shots on target are equal.  A shot on target in the penalty box is more likely to end up a goal than one taken from range.  A shot taken under pressure by a defender is less likely to go in than when a player has more time, and there are some interesting ways to begin to quantify such things.

Firstly, from the shot data you can determine what I called Penetration in an post a couple of months ago.  This makes total sense and has been incorporated in my model for several weeks for players and teams.

Penetration (P) = Shots in Box (Sin) / Total Shots (S). 

Secondly, is what I am going to call StrikeRate.  This is a measure of how many shots a team takes in relation the passes in the final third.  It makes sense too.  If you get the ball forward quickly, either by normal style of play or on a  counter attack, there are likely less defenders to defend.  If you play a lot of short passes it gives  your opposition time to get behind of the ball. 

StrikeRate (SR) = Total Shots (S) / Final Third Passes  (FTP) (currently I'm using passes received)

I've experimented with combining these factors and can get a very healthy looking relationship for a modelled Goal Conversion.  The chart below shows the correlation between the modelled and the actual for just the team's home games this season .  The R-squared value is for the team's in blue.  The three teams in red are Everton and QPR (below the line) and West Brom (above the line).  That the relationship can be this good for 17 out of 20 teams using just a handful of games is extremely promising to be honest.  K is just a constant to line up the numbers.  Note that the goal conversion values for this season are adjusted values based on opponent.

A lot more analysis needs to be done though.  Looking at last season's home data the relationship is not so good (R-squared<0.5 with 4 teams omitted) although the general trend is apparent.  The away data is all over the place though, bot this season and last.  But this is promising too, as it hint at a team's approach to an away game being different from at home.

The really promising thing for me is that from some fairly simple tinkering with underlying data we can start to get near to determining a team's likely goal conversion rate, something that at times seems almost random.  We can use this relationship to lend strength to or against a team's performances, to see if they really are converting chances an observed rate, or have just been a tad fortunate.

There are other factors too which I 've begun to look at that show some positive association with goal conversion, such as final third pass accuracy,  percentage of headed shots, number of crossses - all really match stats that describe a team's style of play.  

I've also started looking at individual players conversion rates using the above ideas and  this shows some promise too. For example, last season Suarez would have a shot for every 10 touches in the final third, (StrikeRate ~10%).  P.Cisse would have a shot per 6 touches (16%) indicating he is in the team to just be a striker where as Suarez likes to get involved in the play more and perhaps doesn't always get a clear sight at goal or that the way Newcastle play (direct) is beneficial to the main striker.

Players like Dzeko, Cisse, Ba, Balotelli, and Hernandez  - all traditional penalty box "strikers" - have league high StrikeRates and good goal conversion whereas players like Rooney, Suarez, Adebayor, Torres and Tevez have lower StrikeRates and/or lower Penetration, players more involved in the play and more of a Forward than a Striker, if you will allow me that distinction.

I'll be following this up later this week with some data and examples.  Like I have said already, this is not a validated relationship yet, but hopefully the beginning's of one.  I have not begun to factor a team or players StrikeRate into my point projections but it's something I'll be hoping to test and develop through the season.


  1. A truly thought provoking read Ste. I like your idea of the StrikeRate parameter, the justification for it sounds pretty much like what the German national football team style of play (although you can also say that for Allardyce's as well.....).

    I'd be interested to know how the teams which tend to rely on patient build-up play fare with this model (e.g last year Swansea)

    I'm going to use your equations to play around with, if you don't mind....superb job Ste!

    El Traca

  2. Excellent stuff as Traca said. Definitely interested in the patient build-up (i.e., tiki-taka) and how that fares in your model. My guess is that while each individual possession isn't as valuable for team's with patient build-up, the sheer number of possessions make them nonetheless effective in attack.

  3. Fantastic stuff. One thing you may consider looking at is Opta's subjective Clear Cut Chances metric. As defined by Opta, "A situation where a player should reasonably be expected to score usually in a one-on-one scenario or from very close range"

    One thing that is interesting is that last season Suarez converted CCC at 33% and this season it is 55%. I think this metric is useful for analysing if someone has been luck or unlucky with their goals.

    Conversely, Giroud would be in the unlucky category, because he is converted CCC at 18% which is very low.

  4. I'll take a look at the patient stytle of play vs. more direct and goal conversion, shots per touch etc. SG is right though that the more you pass, the more patient your play, the more possession you have and the more shots you get. I'll take a look at this too.

    Anon, thanks! SuperGrover (comment above yours) and done some great work with CCC and I hope to add this layer to my data in the coming weeks.