Tuesday 16 October 2012

Results of Guess the Goal (& Cryptic Tease)

A week or so ago I put out a challenge to readers to try and deduce the number of goals scored by 10 players from the 2011/12 season based on each players underlying stats for the season on Shots (S), Shots in Box (Sin) and Shots on Target (SoT).  I prioritised judging the players relative order first over the actual goals scored.  From a FPL perspective I believe it's not really actually that important to predict the precise number of goals scored by Player A, moreso that Player A will be better than Player B.

Thanks first and foremost to JohnDoe2008 (aka SuperGrover) who took part and got the relative orders almost correct, or as best as possible seeing as there were a couple of big outliers in the data.  

I intentionally picked these 10 players to try and represent a healthy mix of players performing on, above and below the curve, not to make this harder, but to examine the difficulty posed when making predictions like this.

Please note: Goal totals exclude penalties and free-kicks.  These aren't normal and are not representative of the shot data.  Most notable exclusions are Aguero and Adebayor who both scored 3 penalties last season.

Player Name S Sin SoT Goals
Player 1 Suarez 126 98 46 11
Player 2 Dempsey 142 84 53 15
Player 3 Ba 109 75 44 13
Player 4 Cissé 39 31 21 13
Player 5 Agüero 130 100 51 20
Player 6 Holt 77 68 33 13
Player 7 Hernández 47 42 19 9
Player 8 Adebayor 100 87 46 14
Player 9 Fletcher 71 55 27 12
Player 10 van der Vaart 102 46 42 10

Below are three charts showing the straight up correlation and R-squared value between each of the shot data columns and goals scored.

As can be seen the relationship between individual shot data is very poor when taken at face value.  However, you can see some trends hidden in there, particularly the Shots In Box data if you remove the two data points that are further from the line, the outliers, which are now picked out in red on the charts below.

It will not surprise you to learn the the the outliers were Suarez and Cisse, with the latter essentially scoring the same number of goals (actually 2 more) from roughly a third of the opportunities.

The first thing this got me to do was re-evaluate the importance of Sin over or alongside SoT.  There is reason to include both factors as major drivers in any model.  But the bigger question this asked was what, if any, is the major difference between the underlying performance data for Suarez and Cisse, subjectively one of the most wasteful strikers last season and the most clinical?  There is one, and I've found it, or almost, but I will have to leave this to a later post for now.


  1. Would be interesting to follow this up, with a larger sample particularly
    Unfortunately there aren't any free sources for 'Sin' data so I'll have to stick with my measures :D

  2. I already have a follow-up for this planned which I hope to post this week. Do you publish your ratings anywhere? I saw something you posted about 6 weeks ago - any updates?