Friday, 28 September 2012

FPL Point Forecasting

It's about time I got around to describing how I am forecasting FPL points scored for all players in the Premier League.  This forms the foundation of my Gameweek Select XI, players featured on the Buying Guide, Captain Options and WildCard Now!

This is going to get quite in depth and technical so if you cannot be bothered with it all then I do not blame you.  Maybe go watch this instead and have a laugh.

I start of with working how many shots on target (SoT) a team is likely to have against their opposition.  I use shots on target instead of actual goals scored because this more representative of a team's performance than the actual goals scored.  The SoT data set for accumulates faster than goals because there are more shots on target than goals.  E.g.  in a 0-0 draw one team may have had 6 shots on target and been unlucky not to have score whilst the other team might have manged just 2 shots on target.  Which team would you fancy to score if they replayed the game?

Two recent matches that highlight the point are Stoke 0-0 Arsenal and Norwich 0-0 West Ham .  In the STO/ARS game Stoke had 1 shot on target and Arsenal had 2.  Not a great difference and neither team can feel aggrieved with a 0-0.  In the NOR/WHM game Norwich had 9 SoT compared to West Ham's 3.  I'd venture that Norwich were much more the attacking team and West Ham were fortunate not to concede at least one. 

Using SoT in this way is very beneficial as a form indicator.  For example, right away after the 1st game of the season I knew that  Everton were on form.  They had 7 shots on target against Man Utd, winning 1-0.  A 1-0 scoreline could mean anything, and typically for Everton would mean nicking a goal and then parking the bus.  But the 7 SoT tells a different story.  Utd conceded on average less than 4 SoT away from home last year and only conceded 7 or more SoT away from home twice last season, against Chelsea and ironically at Everton.
 
The other reason to focus on shots on target instead of goals scored is that the data is much normally distributed than the data for goals scored, which gets truncated close to zero.  Compare the two figures below.  Left is distribution of SoT for all home games in 2011/12 (SOTh), right image is Goals in all home games (Goals H)


Fig. 1  All Team's SoT Home
Fig. 2 All Teams' GF Home



The Goals H (Goals scored at home) is truncated close to zero. No matter how shite you are you cannot score less than zero goals.  However, if you are very shite you can have zero shots on target, although this only happened 4 times last season..  If you know anything about statistics you'll recognise the advantage, especially for predictive work, in the SOTh distribution over the Goals distribution.  That the distribution is close to a Normal Distribution is also ace.

The data above is for all teams playing at home with an average of ~5 shots on target per match for the home team.  Team's are not average though, Man Utd and City average closer to 8 shots on target at home last season compared to teams like Aston Villa and QPR average more like 3 or 4.  Obviously this varies with opposition but the count of SoT for each team at home also shows Normal Distribution.  See below for Man Utd at home 2011/12. the chart on the right is their SoTH distribution.
Fig. 3  Man Utd SoT in all home games 2011/12

For each team in the Premiership I work out where they "fit",into the SoT distribution curve for the league on average (Fig. 1).  To do this, I work how many standard deviations each team's average is from the league average.  Manchester United averaging 7.5 SoT are 1.8 standard deviations above the league average of 5.  Aston Villa (last year) averaging 3.9 SoT at home are 0.8 standard deviations below the league average.  

I do exactly the same thing for SoT away from home, and SoT conceded both home and away.
This gives me an numerical indicator for each team, both in attack and in defence, and when two team's play each other I can combine their indicators to give me a statistically predicted shots on target for and against for both teams.  

This of course ignores a teams ability to convert chances into goals so I measure this too, as well as a team's ability to prevent their opponent turning chances into goals against.  This value is capped at either end (40% to 30%) for converting/preventing chances into goals.  This capping minimises outliers (e.g. after 3 games this season Arsenal having not conceded a goal would have a perfect 100% chance prevention - this is wrong..

Combining the expected SoT with conversion rate gives me the expected goals.

Next the focus is on the player.   I rack up player data for total shots, shots in the box, shots on target, and key passes and aggregate this into percentages for goal potential and assist potential.for each player which is then simply multiplied with the expected goals which is then multiplied by FPL points by position.  

A players goal potential is calculated from a few factors based on data from this season.
  1. Their no. of shots taken as a % of team shots taken, per 90mins.
  2. Their no. of SoT taken as a % of team SoT, per 90mins. 
  3. Their position dictates their baseline SoT to Goal conversion rate (Def: 3.5, Mid 2.8, Fwd 2.5).
  4. Their % shots in the box vs. all shots affects their goal conversion rate.  100% shots in the box improves shot to goal conversion rate by 30%.
Assist potential is no. of key passes as a % of the team's total key passes.


Clean Sheet points potential is also factored in as well by working out the expected goals conceded and converting this into expected FPL points.

I am fed up now so will leave it at that.  All typos come free of charge :)









7 comments:

  1. Ste -

    Awesome stuff. Thanks for linking PL Fant Blog.

    As I mentioned there, I would love to talk through some of your stuff in a less public forum as I am doing the exact same kinds of things statistically.

    Is Twitter the best method to get in touch?

    ReplyDelete
  2. Hi john, very keen to discuss. Mail me at shots_on_target@hotmail.co.uk. Thanks.

    ReplyDelete
  3. Hi,

    Kudos for the great article. I am trying to understand one thing here. When you say you work out how many standard deviations each team's average SOT is from the league average SOT, what do you mean?

    Isn't standard deviation always supposed to be positive? So when you say Aston Villa is 0.8 standard deviations below the league average, how did you arrive at that number?

    ReplyDelete
  4. SD can be positive or negative. Positive is above the mean (ie above average). Negative is below. The avg. lge SOT/home game was 5, with SD of 2.83. As Villa's home average SOT was 3.9. You then need to normalise Villa's value within the league's distribution. http://en.wikipedia.org/wiki/Standard_score

    ReplyDelete
    Replies
    1. Ok, thanks a lot.

      Btw if you are using averages to calculate the standard score, won't the scores for Man Utd and Aston Villa be the following:

      Man Utd = (7.5-5)/2.83 = +0.88
      Aston Villa = (3.9-5)/2.83 = -0.38

      Delete
  5. I think you are right Karan. I actually had to adjust these values (halve them) as they would send some predictions to a negative value. I have tested them with this adjustment by running the values through all of last season's 700+ fixtures with 99% accuracy.

    ReplyDelete
  6. Wow wow.. this is great~~ keep up the good work!

    ReplyDelete