What xG really measures (and what it doesn’t), the key misconception that trips up even seasoned pros, why its loudest critics are mostly wrong, and how to pull tons of xG data with Python.
I haven't build an xG model myself yet but it is on my to do list for future newsletter issues :)
If you've already built one yourself with data at your disposal, I would go and check out the publicly available data (think understat, or statsbomb's 2015/16 free data) and see how the model performs on a larger/different dataset.
I am a beginner and the deep dive into the football concept along with real life examples really helps. It never felt like a long post. I was glued till the end. The python code snippets also helps because it provides a starting point for further analysis. Thank you for writing this post!
If xG measures the chance to convert a shot by an average player, then those who outperform it are logically better finishers. But u also mention it's a misread to mesure finishing by comparing goals and xGs. That seems a but contradictory to me and that's why I was asking what the "average player" here means?
Aah, I see now. That's a great point you're making—and it does seem contradictory. It took me a while to grasp this point as well.
Over a short period (a few months or a season), comparing goals to xG can be useful to identify an overperforming striker or someone on a hot streak.
But as Ian Graham and James Tippett point out in their books "How to Win the Premier League" and "xGenius" respectively, over the long term, very few attackers actually outperform their xG (even Cristiano Ronaldo does not—or at least not by much). So, if we use goals minus xG as a metric, then by that definition, they would be classified as average strikers, which they clearly are not.
The key point is that xG is better interpreted as a combination of a player’s ability to get into dangerous positions and their finishing ability. Over a large enough sample size, this tends to converge with the number of goals scored. So, if a player consistently exceeds their xG, it could reflect not just their finishing skill, but also their team’s ability to create high-quality chances for them.
If even player like Cristiano fail to overperform their cumulative xGs, which again is "the probability an average player scores from a shot", then those players aren't good than the average. That the logical deduction.
But it seems contradictory because what's true is that Cristiano is a top finisher, so the problem is in the hypothesis itself, that definition of xG is wrong, xG don't represent the chance of an average player. It is a simple reasoning by contradiction.
Using xGOT-xG as a finishing metric makes more sense, but in my opinion, it's because xGOT is also an expected metric — it's calculated using the same sample of players as xG. So taking their difference kind of cancels the sample effect and gives a value that reflects shot quality more directly. Yes it's a good alternative to measure finishing, but the underlying issue of "xG representing the average player" still remains.
Thanks for this and really enjoying the posts.
Do you have any advice on creating a model to measure xG with your own approximate coordinates for youth games or matches at lower levels.
I have used ChatGPT to develop something and it seems to hold up fairly well to existing models but would love to get your feedback.
Thank you again.
Hi Gavin, thank you for the feedback.
I haven't build an xG model myself yet but it is on my to do list for future newsletter issues :)
If you've already built one yourself with data at your disposal, I would go and check out the publicly available data (think understat, or statsbomb's 2015/16 free data) and see how the model performs on a larger/different dataset.
I am a beginner and the deep dive into the football concept along with real life examples really helps. It never felt like a long post. I was glued till the end. The python code snippets also helps because it provides a starting point for further analysis. Thank you for writing this post!
So glad it helped 🙏
What's the average player actually means?
Hi Achraff, could clarify a bit more your question?
If xG measures the chance to convert a shot by an average player, then those who outperform it are logically better finishers. But u also mention it's a misread to mesure finishing by comparing goals and xGs. That seems a but contradictory to me and that's why I was asking what the "average player" here means?
Aah, I see now. That's a great point you're making—and it does seem contradictory. It took me a while to grasp this point as well.
Over a short period (a few months or a season), comparing goals to xG can be useful to identify an overperforming striker or someone on a hot streak.
But as Ian Graham and James Tippett point out in their books "How to Win the Premier League" and "xGenius" respectively, over the long term, very few attackers actually outperform their xG (even Cristiano Ronaldo does not—or at least not by much). So, if we use goals minus xG as a metric, then by that definition, they would be classified as average strikers, which they clearly are not.
The key point is that xG is better interpreted as a combination of a player’s ability to get into dangerous positions and their finishing ability. Over a large enough sample size, this tends to converge with the number of goals scored. So, if a player consistently exceeds their xG, it could reflect not just their finishing skill, but also their team’s ability to create high-quality chances for them.
xG is therefore useful in the short term to spot hot streaks and overperformers. But in the long term, a better measure of a striker’s quality is xGOT minus xG (as I discussed here: https://www.pythonfootball.com/p/expected-goals-on-target-xgot-101?r=5mroiq).
I hope this makes sense.
If even player like Cristiano fail to overperform their cumulative xGs, which again is "the probability an average player scores from a shot", then those players aren't good than the average. That the logical deduction.
But it seems contradictory because what's true is that Cristiano is a top finisher, so the problem is in the hypothesis itself, that definition of xG is wrong, xG don't represent the chance of an average player. It is a simple reasoning by contradiction.
Using xGOT-xG as a finishing metric makes more sense, but in my opinion, it's because xGOT is also an expected metric — it's calculated using the same sample of players as xG. So taking their difference kind of cancels the sample effect and gives a value that reflects shot quality more directly. Yes it's a good alternative to measure finishing, but the underlying issue of "xG representing the average player" still remains.
This was a helpful and good read. Thanks
Thank you 🙏