Thanks first and foremost to JohnDoe2008 (aka SuperGrover) who took part and got the relative orders almost correct, or as best as possible seeing as there were a couple of big outliers in the data.
I intentionally picked these 10 players to try and represent a healthy mix of players performing on, above and below the curve, not to make this harder, but to examine the difficulty posed when making predictions like this.
Please note: Goal totals exclude penalties and free-kicks. These aren't normal and are not representative of the shot data. Most notable exclusions are Aguero and Adebayor who both scored 3 penalties last season.
Player | Name | S | Sin | SoT | Goals |
Player 1 | Suarez | 126 | 98 | 46 | 11 |
Player 2 | Dempsey | 142 | 84 | 53 | 15 |
Player 3 | Ba | 109 | 75 | 44 | 13 |
Player 4 | Cissé | 39 | 31 | 21 | 13 |
Player 5 | Agüero | 130 | 100 | 51 | 20 |
Player 6 | Holt | 77 | 68 | 33 | 13 |
Player 7 | Hernández | 47 | 42 | 19 | 9 |
Player 8 | Adebayor | 100 | 87 | 46 | 14 |
Player 9 | Fletcher | 71 | 55 | 27 | 12 |
Player 10 | van der Vaart | 102 | 46 | 42 | 10 |
Below are three charts showing the straight up correlation and R-squared value between each of the shot data columns and goals scored.
As can be seen the relationship between individual shot data is very poor when taken at face value. However, you can see some trends hidden in there, particularly the Shots In Box data if you remove the two data points that are further from the line, the outliers, which are now picked out in red on the charts below.
It will not surprise you to learn the the the outliers were Suarez and Cisse, with the latter essentially scoring the same number of goals (actually 2 more) from roughly a third of the opportunities.
The first thing this got me to do was re-evaluate the importance of Sin over or alongside SoT. There is reason to include both factors as major drivers in any model. But the bigger question this asked was what, if any, is the major difference between the underlying performance data for Suarez and Cisse, subjectively one of the most wasteful strikers last season and the most clinical? There is one, and I've found it, or almost, but I will have to leave this to a later post for now.
Would be interesting to follow this up, with a larger sample particularly
ReplyDeleteUnfortunately there aren't any free sources for 'Sin' data so I'll have to stick with my measures :D
I already have a follow-up for this planned which I hope to post this week. Do you publish your ratings anywhere? I saw something you posted about 6 weeks ago - any updates?
ReplyDelete