Friday, 21 February 2014

Variation in Player ExpG Conversion across the Premier League

In my earlier post today I dropped in a table containing "finishing" stats for 100+ Premier League players across 2.5+ seasons. I put finishing in quotations marks there as this stat is not really what most people will naturally understand as finishing, the data includes long shots, headers, and no doubt an unintentional ricochet off the odd backside. What it actually is a the ratio of a players goals scored to the average expected goals scored (ExpG) based shot location and type and pass type.

The bar charts in the table represent the "finishing" or ExpG Conversion of successive and unique buckets of 40 shots.  A good thing about these 40 shot buckets is that there are very few 0% conversions, i.e. when a player scores 0 goals from 40 shots, at least among the most frequent 55 shooters I've looked at here. There were just 4 instances out of the 158 player-bucket samples where the ExpG conversion was 0%

On doing this table a month or so ago I immediately saw an opportunity to test for the repeatability of ExpG conversion and @11tegen11's post recently 'How to Scout a Striker',  where he did a similar thing, spurred me on to get this done.

For every player with at least 3 x '40 shot buckets' (i.e. 120+ shots since 2011/12) I have plotted below a scatter plot of their nth bucket with their nth+1. From the main table this data is for the first 55 players when sorted by total shots (as it is by default), from Luis Suarez to Kevin Mirallas.



It's a mess, right. No correlation at all between a player's conversion rates from one set of 40 shots to his next 40 shots, and indeed a very similar result to that told by @11tegen11 in his post.

When I looked at the data in the main table though I couldn't help notice a few things. It would be handy if you opened the table in a new window and used the search box to take a look. Players who have notched enough shots to see some sort of trend, like RVP and Suarez, seem to have a generally stable conversion rate with an occasional high or low. Some players have higher highs or lower lows too. Additionally, some player can be seen to be improving pretty dramatically (see Sturrdge, Lallana, Rodriguez) whilst some appear to peter out (Defoe, Dempsey, Lampard). Other players can have enjoy a super purple patch (Cabaye, Adam Johnson) or bad patches (Bale, Ba, Dzeko).

Basically, look through the table at individual players I got the feeling that there's far too much not being accounted for with single players to make the above measure of repeatability a successful one. There's improvement with age (Sturridge/Lallana?), adaptation to the Premier League (Lallana again?). There's getting old, getting injured, or getting both. Changing clubs, changing managers, team mates injured or leaving. Lots. 

This made me wonder, unable to control for all these factors, what kind of conversion rate is normal or 'acceptable' in the Premier League. I took all the buckets and plotted the distribution,


As this appears normally distributed I went ahead and created a randomly generated set of normally distributed data with the same mean (1.03), standard deviation (0.499) and n (158) and plotted the "repeatability" of this data set from n to n+1 to mimic what I did for the shot buckets. Note this is a frequency histogram.






The same messy pattern and no correlation between one result and the next. But of course!. The individual points are all independent events. One result does not influence the next - could the same be said of shots?

What does this mean? Well, I don't know to be honest, perhaps a more learned statistician/football dude can offer some ideas, but a few thoughts spring to mind.

Without being able to control for things like age and injury it does not seem appropriate to asses or judge players ability to repeat their conversion rates, or expect to repeat a conversion rate when such a large variance seen. 

Instead perhaps players can be evaluated based on their own distribution within the league distribution, are they consistently better than average, what is their variance?  Perhaps method could be borrowed from statistical process control (SPC). For example, control charts with limits and zones in which to track a player's ability to score from his shots. 

Quick example knocked up below for Rooney and RVP's last 280 shots. Although both striker's conversion can vary a lot from bucket to bucket they both are basically just varying within the middle 'average zone'.  Both are 'in control', as they say. Could you really say one is better than another? I don't think so. 

Also, as RVP is considered one of the best strikers around and has scored the most goals, this backs up a general assertion lately that getting lots of shots in on goal is perhaps the real skill, rather than putting them away.



Once thing I think that can be said looking at ExpG conversion this way is Daniel Sturridge's current form is off the charts! Since this was done in GW22 he's scored another 5 goals. It'd be interesting to see what his blip below 50% was due to - was it just variation or was there a special cause assignable? Not in the mood at Chelsea perhaps? Anyway, time for lunch.





No comments:

Post a Comment