Wednesday 22 January 2014

Expected Goals Model with Game State

Here's a look at the latest update to model with Game State added to the equation. Game State is the current score in a match when a shot is taken. I've included three states::

  • Close - score is even, or one team is winning by a single goal
  • Up - Team is winning by two or more goals
  • Down - Team is losing by 2 or more goals
The logic is that when the score is "Close" teams don't convert their chances quite as easily as when they are leading, or ""Up". Also when teams are "Down" they convert a little better too. The following chart shows very basic conversion rates for all shots: 


The model is more sophisticated than this though. I look at Game State for each classification of shot, from the different zones, different assist types, etc., rather than just biasing each shot by a set value. Not sure if his is is necessary or not though. Anyway, the results...

xG Model wIth Game State

Pretty good, right? And it looks like an improvement (and is) over the non-Game State results I posted on Monday, However, a small confession - When I was adding in Game State I noticed some data data for Zone A shot (6 yard box) for the current season.had corrupted so had to fix that.  Here's what the chart from  Monday should have looked like...

xG model without Game State

Not a great deal of difference but I think importantly that the top clubs are just a little closer to the line with Game State included than without, which makes sense of course as it's these clubs that that are more often "Up" in Game State. Also, the total Goals for the Game State (GS) model is only 12 out (2573 expected compared to 2585 actual)  whereas non-Game State is 152 below the actual. The RMS error for the GS model is also a little better, 8.5 compared to 10.8.

Comparison with Most Basic Shot Model
One final thing to look at is how this model compares to just looking at expected goals from total shots, if all shots were considered equal...


The R^2 value is good! If you looked at this without the chart along you'd think you had a good relationship but as can be very clearly seen it's at the top end of the model where all the extra shot classification that gets built into the xG model proves to be of immense value. 

All very interesting but what next? Firstly, I'm really going to try and get this latest model built into the insideFPL stats and projections for Fantasy Premier League. I also want to take alook at he performance for the different models to from season to season.

I will also look at some data for individual players. Over the past few seasons players have racked up of hundred of shots and there is some very interesting stuff going on. Some players can be shown to have converted their shots at a rate twice that of the league average consistently over the seasons.








Monday 20 January 2014

Expected Goals vs. Actual - 2011/12 to present

Since my last post I've been back and grabbed a load more data from 2011/12 to add to the 2012/13 set and current season I've been doing over at insideFPL, as well some adjustments to the model (adding through balls as an assist type for example).

With the extra data I've been able to find out some ace stuff which will follow in the next few days/weeks including some cool player info which goes a long way to explaining why Yaya Toure and Yohan Cabaye are scoring a load of goals this season. Hopefully I can get some of this built into the insideFPL projections and stats before the next gameweek.

For now though here is a quick look at the two and half seasons of data as a complete set and the awesome correlation between expected goals and actual goals. You can read a little extra on expected goals here. 

As you can see there seems to be a bit of a Big Club superiority going on at the high end. This could well be due to Game State, a.k.a. score effects, and said clubs propensity for the odd thrashing and getting runaway results. Game State is not built into the model yet but I have it coded in the data so it's next on the 'to do' list and should be pretty simple.