Wednesday, 26 March 2014

Man United Season Comparison

United are obviously struggling this season but until recent homes defeats to Liverpool and City I've noticed a lot of people and media on the fence about where the Old Trafford club really stand this season. There have been some good results, at times, and the narrative has been Moyes must be a good manager, he's been unlucky, he's inherited a weakening squad from Ferguson, who himself got away with it last year. 

In the following two charts I've compared performances and results from the same games this season and last season. United's opposition are lined up approximately from best on the left to worst on the right so that you can easily see how United perform against the better sides compared to the worst. 

The colour of each bar is the result, won, drawn or lost. The height of the bar is my measure of performance using the 'expected goals' difference or xGD, this being the difference between volume and quality of chances created by each side. Based on historical results, an xGD of +0.5 makes a win 50% likely with a draw or loss at 25%. An xGD of +1.0 is 60% win, 25% draw, 15% loss. It's a linear relationship so think in reverse for negative xGD (e.g. -0.5 = 50% prob. of a loss).

Last Season


Starting on the left for United's toughest home games, from the same games played this season we can see they only dropped points in 3 fixtures and two of these were against closet rivals in City and Chelsea, the other Tottenham. In all three defeats United were only marginally outplayed based chances created and 'expected goals' difference (xGD). Possibly games they could have drawn or won on another day.

For their home wins, most were on the back of convincing displays with plenty of good chances created and few conceded. An element of good fortune/finishing/saves could certainly have been a factor in wins over Arsenal and West Ham.

Away from home, United looked outplayed by City but picked up the 3 points largely thanks to awesomeness from Van Persie. They did dominate most games against their closest rivals though, outplaying Liverpool, Chelsea and Arsenal and picking up 7 points in the process. Against the midtable sides, United failed to pick up what appears to be deserved wins against everybody except Spurs. Aside from a shock and undeserved defeat at Carrow Road, United beat all the lower table sides on the road but their performances look far from convincing against QPR, Fulham and Sunderland.

Overall I would not say United got lucky with an 89 point total last season. At home maybe they snuck a cheeky 4 points from the Arsenal and West Ham games. Away they looked to have dropped more points than gained based on xGD performance.

This Season

The home whuppings by Liverpool and City look very bad on Moyesie's United CV but only Chelsea can say they've stood up to these two this season. Last season's home performance against City isn't a million miles away from this season, plus Moyes can't be blamed for the form of Liverpool. How would a Ferguson-led United do against either of these two this season? City have beaten United 4-1 and 6-1 in past seasons.

United had the best of the chances in an early season clash with Chelsea although Mourinho set his side firmly for a 0-0. A convincing performance didn't yield any points against Everton and Martinez' guerilla football. Defeats against Spurs, Newcastle and WBA were in very close games that could have turned out differently on another day. Theses defeats are offset by victories against Arsenal and Stoke despite not bossing the chances. Liverpool and City aside, against the top teams United's performance hasn't been all that different from last season, although the results have.

Against the lower league sides United have been mostly dominant at home, and generally much more so than last season. The Fulham draw is definitely 2 points dropped.

Away from home United actually lead the league in points this season, as they did last year. This time around their performances look much less convincing though, rarely getting close to the +1.0 xGD mark and from a purely statistical point of view it would appear they've lucked into a few victories on the road.

It has to be said though they have had the majority of chances in most of their away games and their finishing has been excellent. I believe the likes of Rooney and Van Persie are definitely above average strikers, especially compared to what the bottom half of the table can put out. It's all very well creating chances, they still need to be finished,

Whilst they won close(ish) encounters away from home against lower to mid table clubs United have failed to get much out of similar close games against their main rivals. Van Persie missed a great chance to rescue a point early in the season at Liverpool. A 10 minute Eto'o hat-trick at Stamford Bridge, the first a looped deflection off Carrick, knocked the stuffing out of any chance of a result there..

One thing to note is that the profile or distribution of results and performance looks very different between this season and last. Under Fergie, United hit a steady and consistent level of performance over teams regardless of their standing in the league, whilst in Moyes' tenure performances and results are all over the place and a lot more skewed toward the weaker end of the league. 

This may be down to a trait I've been playing about with which I'll call 'Influence'. Does a team dictate the amount of chances in a match for both teams? Or does a team simply perform to their own level within the terms of their opposition?  For example, Liverpool and City look very influential, it doesn't matter if they go to the home of the most defensive team in the league, they'll still play and perform the way they want to. Similarly with Chelsea, but from a defensive standpoint. It doesn't matter who you are, you play Chelsea, they won't let you have many chances. United don't look influential at all this season. If they play good teams they get beat. If they play lesser teams they beat them. This was not the case last season where it was they who wrote the script.

This of course sounds pretty obvious but if can be measured from a team's numbers (possibly by correlating a teams' performances vs. opponents against what is expected against said opponent) then it may prove a useful indicator of a team ability on top of a 'raw power' metric like xGD. For example, maybe Chelsea, whilst not boasting such mighty goal/xG numbers, are more 'influential' than City right now?

Monday, 24 March 2014

Daniel Sturridge in the Penalty Box with a Football

In this post I am going to take an even closer look at Daniel Sturridge's shot performance, this time by location. I do not think goalscoring is random. In a post I did a month or so ago looking at variation of shooting success from the 50 most frequent shooters over the last few season the distribution looked decidedly normal. However, the variance is so large that it could appear random. After my post last week looking at Sturridge's shot data it struck my that this same large variation will be present in his data and it'd be worth breaking it down further to eliminate as much 'noise' as possible.

Expected goal models based on shot location are prone to error. For example, there are a lot of shots from outside the box in football, roughly half of all shots. For the data I have on Sturridge's 230 shots 80 of these have been outside the box (34%). The expected goal value for such a shot is 3-4% depending on the type of pass for the shot. A couple of goals outside the box can really inflate a player's conversion numbers, especially with small samples sizes. I have chosen deliberately to use samples of 20 just shots in this analysis with one eye on application. Things change so fast in football, and I believe stats really need to keep up with this to be meaningful.

I also imagine there will be relatively high error/variation with shots very close to goal (six yard box). This is me talking, not the stats (although I'll be sure to examine the data), but it strikes me that scoring from a few yards is more about positioning, being in the right place at the right time, and very basic technique, rather than an inherent skill to accurately guide a football past a goalkeeper. 

My logic is that shots in the 6 yard box will be amongst the most polarized, either a player is presented with a very simple "tap-in" type chance, where the goalkeeper is out of position, or a chance falls to a player in close proximity to the 'keeper or defender and the actual scoring angle is really difficult. Both shots will be given the same expected goal value. Without data on goalkeeper/defender location  I believe any expected goals model will really struggle to capture what's going on here and thus will be error-prone. It should be possible to benchmark the expected variance by shot location and I'll be sure to follow this up for all the different shot zones.

For this reason I have decided to focus in this post on shots in the main penalty area and exclude shots in the 6 yard box and shots from outside the box. Not that these aren't important, of course they are, I just think there are quite different factors at play with them both and they should be treated separately. I'm also ruling out headed attempts. Again this is just a personal hunch which I will evaluate properly in due course, but I would assume headers have a different statistical profile to regular shots when it comes to accuracy, placement and goal conversion. This shouldn't impact Sturridge's shot data too much though, so let's get back to him and his shot history broken down by two key locations in the penalty area.

Daniel Sturridge in Penalty Box (Centre)



All these charts show a rolling average of 20 shots. Sturridge has had 46 shots of his 230 from the central area of the penalty box. Only 14 of these were whilst at Chelsea were he had 102 shots in total (13%). Since joining Liverpool he's had 128 shots, 32 in Box (Centre) (25%). 

There's a lot less variation in this Goal Conversion chart than the one in my last post looking at all shots. There Sturrige's Goal Conversion reached 300% of the average. He does a great job in the central box but he's not been superhuman. 

Comparatively, there's is little variation in his shot accuracy. These charts show shot accuracy adjusted by expected accuracy, where 100% would be performing to league average by location, etc. There's a general 'eyeball trend' between shot accuracy and goal scoring, although the R2 is only 0.4. He is good though, never falling below average and we can see his 'shots in corner' accuracy has perhaps driven his increased goal scoring of late. Precisely how good though I cannot say and I will need to benchmark performance of all players shooting in the different locations. A quick t-test against accuracy of all shots in this location gives 97% confidence that Sturridge is better than average.

Daniel Sturridge in Penalty Box (Wide)

Whoah, this tells more of a story. Sturridge joined Liverpool from mark 13. Again this is a rolling average of 20 shots. He's clearly been on an upward trajectory since putting on a red a red shirt (once he got going) and it would seem a supreme level of shot accuracy has been the driving factor. 

What could explain this? My first thoughts are that it is team related. Liverpool are obviously tearing it up at the moment, so this will be the subject of my next where I'll track Liverpool's performance in conversion and accuracy in these two shot locations and compare that with Sturridge, as well as try to normalise any team effect seen.

To wrap-up, there's a more to go on looking at Sturridge's goal and shot data broken down  by position than there was in my first post looking at all shots bundled together. As usual though, this has given me more questions than answers. But that's a good thing,  Analysis is the process of breaking down a complex topic into smaller parts to gain a better understanding of it, and breaking down a player's shot data like this has given me a better understanding, or at least the next set of question to answer.

Friday, 21 March 2014

"A History of Shots" starring Daniel Sturridge


I'm taking a short detour from testing stuff to have a closer look at some numbers. One thing I have noticed looking at the expected goal stats over the last few/several months is that, whilst it holds very true at the team level, it can get in a mess pretty quick-style with player data. This is particularly true when a player like Daniel Sturridge comes along so I've decided to take a closer look at his numbers to see what's going on. 

In this post I'm going to run through a bunch of different 'metrics' from good ol' goals per game to some more advanced stuff like shot placement. The stats are taken from the last two and half seasons, from beginning of 2011/12 to when he was at Chelsea to the Southampton v Liverpool game a week or so ago..



Goals per Game 


Sturridge was at Liverpool from game 30. Each point represents the goals scored over the previous 5 games, a rolling average. This is the type of stat you would have seen on this blog 10 years ago if it existed. The narrative would be that Sturridge wasn't much kop at Chelsea but must have found Ian Rush's boots in his locker at Anfield. I wish I had some data from 2010/11 when he scored 8 goals in 12 games for Bolton (0.66 goals/game). This chart doesn't include these games at all so he did do alright for himself at the start of 11/12 for Chelsea.

Goals per Shot

Goals per game is quickly dispensed with. I have 230 shots for Sturridge in my dB so first step was to normalise Sturridge's goals on a per shot basis. Each point is goals scored from the last 20 shots. 20 shots approximates roughly with 5-6 games for Sturridge. Similar Narrative though. Bright start. Big slump, sort of recovered, Joined Liverpool. Slumped a bit again. Boom. Rushy's boots. 

Chance Quality (a.k.a. Expected Goals per Shot) 
Everyone knows by now that not all shots are equal so the chart above is the average expected goals per shot based on the location of the shot, whether it was a header or not, and type of pass (cross, through ball, etc). Again this is a rolling average over 20 shots. The narrative? It's up and down but gradually up. At Chelsea his average chance was worth 0.10 goals. At Liverpool it's 0.13. This doesn't sound much but it's 30% more, thus we'd expect Sturridge to score 30% more goals per shot at Liverpool based only on the quality of chances he's getting. 

Chance Quality does not do a great job at all of explaining Sturridge's increased goals per shot ratio in his time at Liverpool though, and this should highlight my current struggle with expected goals as a metric for players I mentioned in the opening paragraph. It does not track well with his goalscoring chart (Goals per Shot). According to his Chance Quality, at Liverpool he should have been good for around 15 goals. He's actually scored 26.  This then leaves the big question. Is it skill? Is it luck? Is it Liverpool? Is the model not good enough? All of the above I expect but statistically we are on very thin ice.


Goal Conversion
This chart is analogous to the 'Goals per Shot' chart except now we are adjusting for the Chance Quality of each shot. I've marked out the 'average' line at 1.0. This is the first time we get some extra information from  the more advanced stats in my opinion. We can easily see how Sturridge varies above and below this average which we couldn't do satisfactorily with Goals/Shots as we did not know if the shots were "easier" or "harder" than an average shot.

We can also see something new - Sturridge's "peak performance" came at the end of last season and the start of this years (around shot 146), and not since he hit the headlines more recently. We can also see he matches this "peak" back at the very start of the 2011/12 season for Chelsea. Lots of "quotation marks" there as we still cannot tell what caused this how much it has to with Sturridge himself,

Shot Accuracy

The top chart here shows Sturridge's Shot Accuracy, that is shots on target divided by shots. It's pretty interesting in that he's remarkably consistent. Even during his "low accuracy" days midway through his Chelsea spell, he's still consistent, ish. It's interesting too that his low accuracy period at Chelsea is tentatively aligned with his worst goal scoring spell, although let's be honest it doesn't explain the highs and lows well at all. 

The second chart is an adjusted shot accuracy measure weighed by expected accuracy for shots based on location, type, pass, in the exact same way Expected Goals is done, except for accuracy. It generally does not differ from and is highly correlated with standard Shot Accuracy.

It does have to be said though that Sturridge's two goal scoring peaks (from Goal Conversion chart) do align with his two best spells of Expected Accuracy. Could this be important? . We can also see Sturridge does a great job of finding the target. Generally 1.5 - 2 times above average at best, and only average at worst. 

Shot Placement 
This is the final chart, I promise. The thicker blue line shows the percentage of Sturridge's shots that were in one of the 4 corners of the goal. The feint line is his regular shot accuracy as shown above. You can see toward the end of the chart the two lines almost converge which means almost all of Sturridge's shots are toward the corner of the goal rather than hit centrally.  A good thing, right?

Generally speaking, a shot to one of the corners is almost twice as likely (37%) to result in a goal than a shot hit centrally.(20%). A shot is also more likely to be hit in the corner (20%) than centrally (11%). If you split the goal into three sections however it's an even split and could well be appear random. Sturridge's shot placement of late very impressive, consistently finding the corner with ~45% of his shots. He's displayed this characteristic over a good few periods in this data too. Is assuming the 'keeper is placed centrally and that hitting the corner a "good thing" a viable one?

To Wrap Up
This doesn't prove or disprove anything, but I feel probing the data this way is essential to get under the skin of it. My expected goal model is based on a huge number and variety of shots from different players and teams and I think currently it best models the system as a whole, that being football teams, which themselves comprise all the same vagaries and nuances that go into the model. Perhaps a player should be consider a sub-system and not be rated or assessed upon an expectation derived from the whole system (team).

It's becoming evident from expected goal models that >>90% of a team's goals can be attributed to the combined quality and quantity of chances the team creates, but in my experience the same cannot be said of the all-important leading goalscorers in the league. When a player is way above the average what does this mean? That'll he''s been lucky, that he's amazing? That he'll be signed by Real Madrid in the summer? Every season in every league there's a handful of top goalscorers whom I'll bet all out-perform the model. 

For me, teams create chances, players score them. Digging deeper into Sturridge's shot history has revealed some possible pointers to what makes him tick, how he'll perform given a given chance created. His shot accuracy and shot placement aren't stable but there's a lot less variation in these than in his goals/shot trends, plus this is all based on a rather small 20 shot samples, typically accumulated in a 5-7 games for a striker like Sturridge

I am hopeful looking at a profile of shooting stats like this this will make a better subject to evaluate so my next step will be taking a similarly close look at the same data for a number of other players. Van Persie, Suarez, Benteke, Michu, Ramsey, Jelavic and Papiss Cisse spring to mind as good candidates.

Thursday, 13 March 2014

Testing of Football Stats: Part 2 - "Predictive"

 A couple of days ago I used a correlation matrix to compare a bunch of different underlying football stats and metrics with the top level stuff like goals and points from the same set of games. A correlation matrix has to be considered a rough-and-ready first step to see where the best and worst relationships lie rather than a conclusive test in itself.

Below is the correlation matrix for the same bunch of stats but from one group of 38 games to the next 38. No overlapping . The method I’ve used chosen uses  rolling sets of 38 games rather than a season to season, or half-season to half-season.  E.g. for each team I pair games 1-38  with games 39-76, then pair games 2-39 with 40-77, and so on.  

Click me for a bigger me!
I’ve highlighted just the areas of prime interest, which is mostly Goals For, Against and Points Per Game (PPG).

A quick reminder that my primary objective is to predict goals scored over a short spell of matches for the fantasy football work I do over at InsideFPL although this analysis is so damn interesting who knows where it'll end up.

Starting with Goals For and Against then (GF+ & GA+, 1st and 4th columns), we can see the strongest correlations are with underlying stats rather than with goals.  This is pretty much known already but it’s good to see it like this and my favourite thing about a correlation matrix is how you can get a quick overview of all this.

Expected Goals (xG) beats Expected Goals from SOT (xGT). I mentioned this specifically as in the last post xGT had a stronger correlation with Goals when looking back but here xG appears the better looking forward . This suggests to me that whilst using shots on target will likely to yield more goals it’s not necessarily a sustainable metric.  I should have really included Shot Accuracy in this analysis in the same way I did with other “skill metrics” like Goal Conversion, but alas, next time.

The two ratios I’m fond of right now (and spending far too much time with)  are looking good in the matrix. Again I hasten to add a correlation matrix is just a high-level overview, but hopefully you can see why I've grown fond  But they make sense too. If ‘Expected Goals” is an acceptable measure of a teams ability to create goalscoring chances then expected goals difference (xGD) seems the next logical step. I could probably waffle on for too long about the subjective  difference between xGR (Expected Goals Ratio) and xGD, how one accounts for “Control”, the other “Power”, but so far my testing has proven no significant difference between the two so ‘ll leave it at that for now,

‘Predicted xGD’ was my quick attempt to adjust for opposition  strength in my data. although this shouldn't be significant over 38 matches. It was simply the 38 game average xGD of each team minus xGD of opponent on a match by match basis, and then average this up over 38 games. It’s actually what led my to get carried away and build a ScorePrediction Tool (still in Beta testing  - use with caution!).

This is all looking pretty healthy at this point and confirms what others have done recently (see Michale Caley - What is the Best Method for Predicting Football Matches?).   However, as with all healthy dishes, a pinch of salt is required. Following Michael's example my next step was to take a closer look at some of most promising metrics and perform Error Testing.  The results of that will be my next post,and whilst it does pour some water on theses initial results this actually pointed out some new paths for getting much more up closer with the data and pull it apart some more, rather than looking at it all reduced to single numbers (e.g. why did Chelsea have such big Point to xGD error under Benitez?).

I'm also pretty keen to start looking at Player Impact on xGD, both relative to team and league averages, and start investigating some key player stuff as well as understand some of the variation I'm seeing in team trends.

Monday, 10 March 2014

Testing of Football Stats: Part 1 - Descriptive

As mentioned in my post last week I've done some testing of a few different football metrics to gauge how well they describe what has happened and how well they can predict or project what will happen in the future. 

My first step was to do a 'broad sweep' using a Correlation Matrix, this being a grid to describe correlation between multiple variables. With the historical fixture data I've now got (as displayed in these tables - not updated from weekend yet) I am able to line up every team's goals for/against, excepted goals, shots, shots on target, points, etc. etc. chronologically and then look at the average of all the stat from the past x games and/or the average of each team's next x games.

To start with I've looked back and forward 38 games and with the method I've used this doesn't split the data into chunks of one season to the next, but looks at a rolling 38 game trend. 


What Just Happened?

The first correlation matrix is descriptive. It's looking at how well stats from the last 38 games correlate with other stats from the last 38 games.


Click = Big

The colour code is Excel's conditional formatting; Blue is the strongest positive correlation, red is the strongest negative. 

The two things I am most interested in are Goals For (GF) and Points Per Game (PPG). Looking straight at Goals Scored then (the top row), the strongest correlation is with PPG. This means if you didn't;t know how many goals a team has scored using how many points accumulated would give you the best idea. Looking at Points per Game (PPG) (last column), w can see the same is true of points. Goals For is the strongest correlation. This confirms what we know of course about football. Goals win games. 

The strongest underlying stats to describe Goals For is xGT. This is a secondary version of the Expected Goals model I have which only uses shots on target instead of all shots. I've noticed this is a better descriptor of actual goals in the past and when doing previous testing. However, it doesn't feel right to 'throw away' any shot that is off target or is blocked. A penalty is the good example of why. If a team has a penalty they are expected to score it; a penalty has a xG value in my model of 0.78. However, if the player balloons said penalty over the bar the xGT model rates that as 0.0, as if it never happened. I can understand why xGT trumps xG as a descriptive metric of goals scored, getting more and better quality shots on target is of course going to lead to more goals, but using xGT would assume that shot accuracy is a sustainable skill when that is not something I can readily assume at this point. Also, as you'll see in my follow up post, xGT is not as good a predictor of future goals as xG is, which goes some way to back up the non-sustainability of it. However, I do believe shot accuracy could be an important bridge between shots/xG and goals/results so will have to definitely have to investigate this further.

I've included a couple of my metrics xGR and xGD (and their xGT versions). xGR is Expected Goals ratio xGF / (xGF+xGA), xGD is Expected Goal difference, xGF-xGA. These are two metrics I was most interested in testing as simple, single values to describe a team's overalll strength. You can see the correlation between xGR and xG is almost 1.0, and in practise I believe there's really little difference between them, but of them both xGD does a slightly better job of explaining goals and points, although their individual components look better. 

The poorest relationships in the matrix between goals and underlying data is with Goals Against. This again is something I've observed over the last couple of season. The correlations aren't bad, but overall they are not as good as with Goals For, and in particular with PPG, but the same could be said of actual goals. It doesn't look like conceding goals matters that much to results, with the proviso that you are scoring them too. Sounds obvious right, but what it means to me is being a team who can win 4-2 on average is better than winning 1-0 on average. 

I've also thrown a few oddball type metrics in, namely Chance Quality xGF/shots xGA/shots and Goal Conversion GF% GA%, as well as xGDO which a combination of the latter similar to PDO. These are all the missing links between the underlying stats like xG and the real thing. I was curious to see how their correlations compared and you can see they are all pretty weak. They obviously are really important, especially in important games like a 6-pointer against league rivals, but I think this demonstrates that quantity and quality of chances is still the biggest determinator of overall success rather than boasting high chance conversion or save stats. 

There's obviously a lot to look at in this matrix so if anyone spots anything interesting or contradictory that I've missed please discuss it with me in the Comments section. Also, as mentioned briefly at the beginning, a correlation matrix is only a coarse view of how a bunch of variables stack up, a quick heads-up, if you will.  Next up I'll look at What Happens Next, using the same data set to examine correlations between results and stats from a team's last 38 games and their next 38. 

Wednesday, 5 March 2014

Latest Form and ExpG data updated

I've updated my Expected Goal by Fixture data table with the games from the last weekend. 

Some quick notes:

Chelsea and Newcastle deserve full credit for their goalscoring performances, both achieving the highest expected goals scored . Liverpool too, although to a lesser degree. Everton and Hull appear to have been the two teams which failed to take the chances they created.

Defensively, Everton, Stoke and Crystal Palace limited opposition's chances the most and the first two can say they deserved their clean sheets. Well done Stoke.

10 Week Form (xGR)



If you enter an asterix (*) into the Search Box and then sort the table by clicking on the 10wk GR column header (click twice fro descending order) you'll get the above view. 

xGR is each team's expected goal ratio from the last 10 games. It's calculated by xGfor / (xGfor + xGagainst). I've done a lot of testing over the weekend and it correlates very well with goals and points scored in the present and the future.I'll be posting more on this soon but suffice to say it's an excellent descriptor and predictor of future goals and points (much better than goals and points themselves, and better than xG on it's own. Like i said, more on this coming soon, for now I'll put it forward as a very good indicator of a team's overall ability right now to score goals and win games. 

As you can see from the table City are top trumps with Chelsea leading the chasing pack a little behind. Chelsea's xGDO though (which is my bastardised version of PDO) shows they've been getting results above and beyond that explained by the raw stats. Whether this is due to limits of the model, Mourhno's tactical nous or pure luck who can say. All of the above most likely. However, it does mean that City remain the best team in the league right now. xGDO, just like PDO, regresses to the mean (1.00) and is not sustainable, nor is it an able predictor of things to come.  I'll post a little more on both team's recent performances using data from the tables. Chelsea will have to ride this wave for the remainder of the season if they want to win the league, and hope City don't catch one of their own.I