9 Comments
User's avatar
Ryan Inghilterra's avatar

This is excellent Martin! Thank you for the detailed write up.

Expand full comment
MartinOnData's avatar

Thank you, Ryan. It's actually the most I've spent on a newsletter.

Expand full comment
Ryan Inghilterra's avatar

I can tell. Formulating a topic you want to write on, then actually crunching the numbers, creating the visuals, and turning it into a final cohesive article is a lot of work!

Expand full comment
George H.'s avatar

Fascinating analysis. Two of my best friends are xG haters so you have given me some excellent counterpoints to their rudimentary arguments (😂). Thanks.

Expand full comment
MartinOnData's avatar

Glad it helps, George. The xG 101 newsletter is also anti xG-hater material 😅. Check it out if you havent't already (https://www.pythonfootball.com/p/expected-goals-xg-101)

Expand full comment
James Dailey's avatar

The StatsBomb model had material upgrades in summer/autumn 2022, FYI.

Having seen the lifecycles and evolution of data and analytics in MLB, football is following a similar path. StatsBomb's addition of qualitative factors such as categorized shot velocity and height of the ball at the time of shots will eventually be supplanted by full integration of tracking and event data, including the movement of the ball itself - including spin on shots, etc.

I have worked with Wyscout, Opta, and StatsBomb data over the last 5+ years and the post-2022 StatsBomb xG model (PSxG even more so) is much closer to an objective measurement of reality on a micro basis than the other 2 I have worked with.

This is an important distinction in analytics, IMO. When top-down and bottoms-up do not reconcile logically and coherently, it is often a flag that one is dealing in correlation without causation, IMO.

Expand full comment
MartinOnData's avatar

Love this. Thank you for sharing! It is indeed what I thought (and feared) about the provider (retro)updating their model. Unfortunately that was the only data I had 😅

Expand full comment
Juan Vasquez's avatar

Thanks for the article, it was very exhaustive. To add on it, it would be nice to know which provider is more accurate, because you determined reality as team performance on terms of points and ranking, but xG is about odds, the correspondence with reality is with the accurate representation of a chance, so we should take that discrepancy cases and go to check the plays to assess which provider is more accurate, it is subjective but is the only way to know. I also would only take into account the xG on open plays, as those are the difficult ones. Wyscout is the crazy uncle in the family, but maybe if we check the plays could be the most accurate.

Expand full comment
MartinOnData's avatar

Hi Juan, thank you for the suggestions. Indeed, doing an analysis on more disagregated shot/chance level would make (more) sense.

Expand full comment