#002 SCATTER PLOTS : ANALYZING EXPECTED ASSISTS(xA) AND EXPECTED GOALS(xG).

Kiplagat Seroney
7 min readNov 22, 2021

--

Greetings and welcome to Article #002.

Lets build from where we started with analyzing scatter relationships between two variables that are correlated. Today we are focusing on Expected Asssists, xA and its relationship with Expected Goals, xG.

Definition of terms

From FBREF, here is the definition of our variables;

Expected Assist is xA, or expected assists, is the xG which follows a pass that assists a shot. This indicates a player’s ability to set up scoring chances without having to rely on the actual result of the shot or the shooter’s luck/ability. Note: Because xA comes from passes, not all assists will be given an xA value.

Expected goals is xG, is the probability that a shot will result in a goal based on the characteristics of that shot and the events leading up to it.

Expected Assist (xA) is our independent variable. xA is the cause of our Dependent variable , Goals which is our dependent variable is the effect. This makes sense because we require assists to get goals.

Lets go.

PART 1 (USING EXCEL FOR REGRESSION)

a. The first step was getting the 2020/21 Premier League data from Statsbomb data from FBREF

b. After cleaning the data to my specifications, I went ahead to the Data ribbon on Excel and selected data analysis. This is done by activating the Analysis Toolpack.

c. I selected regression and selected data on xG as the y and xA as the x.

The Results from the analysis was as the small table below.

xG is y (Dependent Variable) while xA is x (Independent Variable)

From the data we can observe an approximately 96.28% R square.

d. From here, I repeated the same using data available from FBREF using the 2021/22 Season data. (This is as at 12 games played)

From the data we can observe an approximately 95.68% R square.

The high Regression analysis proves that Expected Assist causes Expected Goals.(Expected Goals in this case is the effect).

*** In a normal soccer game, Assists do actually cause Goals(the effect of the cause) ***

PART II (USING DATAWRAPPER AS A QUICK VIZ TOOL)

Datawrapper is a dynamic tool that I have only began using recently and it is a sweet tool that allows you to make quick visualizations and the best part is that it is free and you can really do a lot without logging in.

Lets try and visualize and try to derive insights from the graphs.

a. THE 2021/22 ENGLISH PREMIER LEAGUE SEASON

  • TOP 6 ANALYSIS AS AT MATCH 12

Insights

i. Liverpool have the highest points as per our both metrics. However, they are 3rd as at Match 12. They were involved in a ‘high’ scoring 2–2 Draw with Brighton and a 3–2 loss against the Hammers before the International break. As per the Prem Table, their Goals Against (GA) is 11. Chelsea and Man City combined GA is 10. However 4–1 win against Arsenal helps balance things in relation to Goal Difference.

ii. My beloved Chelsea are top of the table with a lower xA and xG than Liverpool and Manchester City. However it is important to note, compared to the two, Chelsea have the highest Wins (9) and compared to Liverpool, Chelsea have 2 Draws compared to Liverpool’s 4.

iii. An interesting one is Arsenal who are 5th on the table however as I was looking at Goal Difference, The top 4 consists of Chelsea (+26), Manchester City (+19), Liverpool (+24) and West Ham (+) . Arsenal’s Goal Difference is (-4)

Note to self

  • A fair reflection would include last seasons data as at match 12
  • I am thinking of doing a multi regression analysis incorporating other metrics that fit the model. Its clear the model does not take to account other variables. I will continue researching and learn about this topic and do a much more comprehensive analysis.
  • Since I am very green to the world of Soccer Analytics, I hope to review Regression as a topic more holistically.
  • BOTTOM 5 ANALYSIS AS AT MATCH 12

Insights

i. Despite Newcastle being last as at Match 12 with 15 Goals for and 27 Goals against, Norwich has scored 7 goals this season with 27 goals against. This explains and further proves xG and xA are correlated to Goals and Assists. HOWEVER, Lack of a win this Season makes sees Newcastle close in last.

b. THE 2020/21 ENGLISH PREMIER LEAGUE SEASON

  • THE TOP 6 ANALYSIS

Insights

i. Manchester United was 2nd however, the are below Man City, Liverpool and Chelsea. A quick glance at the Table provided by Statsbomb and FBREF indicates Manchester United had the least number of Draws and the second highest GD from the Champions. Further, Man United recorded the third highest number of Draw (13)

  • THE BOTTOM 5 ANALYSIS

Insights

i. Fulham earned a higher xA and xG than Six teams. However they were relegated. This “bad lack” is attributed to their 5 wins (Lowest), 13 Draws (Second highest from Brighton) and 20 Losses (3rd highest from the two relagated teams).

  • THE WEIRD ONES AS PER DATA

i. Outliers in top 6(THE 2020/21 SEASON)

  1. Tottenham #7 has 18W (5th of 12), 8D (6th of 10) and 12L (9th of 12). From a quick analysis of GD they were 4th (GD of +23)
  2. Leeds #9 has 18W (5th of 12), 5D (8th of 9) and 15L (7th of 12)

*** Since this is a simple linear regression analysis lets not attribute other variables. However it is important to note other variables do affect. ***

ii. Outliers in bottom 5(THE 2020/21 SEASON)

note;

NB.1. Ranking is Descending regardless of Win, Draw and/or Loss. This are the figures in brackets.

NB.2. In the case of a tie in W, D and/or L, I count the tie between many teams as one.

  1. Crystal Place had 12W(8th of 12), 8D(6th of 10) and 18L(5th of 12)
  2. Wolves had 12W(8th of 12), 9D(5th of 10) and 17L (6th of 12)
  3. Newcastle had 12W(8th of 12), 9D(5th of 10) and 17L (6th of 12)
  4. Southampton 12W(8th of 12), 7D(7th of 10) and 19L(4th of 12)
  5. Everton 17W(6th of 12), 8D(6th of 10) and 13L(8th of 12)

iii. Outliers in Top 6(THE 2021/22 SEASON)

Insights

  1. Leeds United #17 has 2W , 5D and 5L
  2. Brentford #14 has 3W, 4D and 5L
  3. Southampton #13 has 3W, 5D and 4L
  4. Leicester #12 has 4W, 3D and 5L
  5. Everton #11 has 4W, 3D and 5L
  6. Crystal Palace #10 has 3W, 7D and 2L
  7. Manchester United #8 has 5W, 2D and 5L

iv. Outliers in Bottom 5(THE 2021/22 SEASON)

Insights

  1. Tottenham#7 has 6W 1D 5L
  2. Brighton#9 has 4W 5D 3L
  3. Arsenal#5 has 6W 2D 4L . They have the fifth lowest xGD (Expected Goals Difference is xG totals include penalty kicks, but do not include penalty shootouts (unless otherwise noted. Provided by StatsBomb.)
  4. Wolves#6 has 6W 1D 5L
  5. Watford has 4W 1D 7L

PART III (USING PYTHON)

Inspired by FC Python’s piece on Introduction to Simple Linear Regression Johns McKay video on Linear Regression .

MULTI-METRIC ANALYSIS

Computing power is powerful and just sexy. Check out the figure below illustrating various metrics from the EPL season of 2020/21.

THE FIGURE ABOVE DISPLAYS CORRELATION BETWEEN VARIOUS METRICS

The Metrics are, Goals, Assists, Expected Goals (xG), Expected Assists (xA), Expected Goals per 90 (xG/90) and Expected Assist per 90 (xA/90).

From the above plot, we can derive the following insights;

  1. Assists and Goals are perfectly correlated.
  2. Expected Assists and Expected Goals are perfectly correlated.
  3. Xg and Xg/90 are perfectly corelated since xG/90 is derived from Xg. The same applies to Xa and Xa/90.

GENERALLY THIS METRICS DONT VARY FAR FROM EACH OTHER SINCE THEY MAKE FOOTBALL LOGIC AND MATHEMATICAL LOGIC.

FINDING THE RIGHT STUFF

Above is a few tests that will try to find the sweet spot on matters Correlation and Linear Regression. Don’t stop here. FUTHER READING from FC Python.

NB. This is not comprehensive, this is just me having fun. Go ahead and have fun too.

CONCLUSION

General observations

  • This analysis aims to look at direction, that is, a team is on the right direction towards converting their chance, however, factor like Draws and Losses which subsequently affect Goal Difference, come into play.
  • The scatter relationships illustrated above somewhat shows how the League’s table reflects roughly. I will try and fine tune the analysis in the future days as I continue learning.
  • Its of interest to use Goal difference. The model doesn't take to account Goal Difference as per the Arsenal example above. I will research more on this topic.
  • McKay show a great analysis on Rank with respect to Goal Difference.

Limitations

This is just a few metrics, what is the correlation between other metrics compared to each other ?

Cheers Guys and Gals, I hope this brings insight to you. This is my second article lets discuss, hit my DM. I am willing to learn more from the Community.

--

--

Kiplagat Seroney
Kiplagat Seroney

No responses yet