#003 ANSWERING THREE SOCCER QUESTIONS STATISTICALLY USING PYTHON.
This festive season was different and unique in many ways. I found myself having some free time and decided to play around with data of the 2020/21 Premier League season. I have been learning and as I promised from an Earlier Article, “ I aim to use this platform as my sports analytics journal for tracking my learning progress and hopefully make friends while at it.”
Today I would like to share practical Questions and answers the same using statistical techniques imbedded Python and it’s modules for data analysis.
Here is a quick demonstration of Three Questions we can Answer Statistically from the EPL 2020/2021 Season using Python. A quick shoutout to Statsbomb and FBREF for making the data available.
1. Which team had the Youngest squad (25 percentile of the Age Series) with Goals above the mean ?
Let us first describe the columns in play. (Age and Goals)
Having described our data, we now ask the question.
Aston Villa were the youngest squad to achieve more goals than the Average.
2. Which Squad has the highest Average Age of Players and how many goals did they scored ?
Crystal Palace had the maximum Age and the managed to score 39 goals which is below the Average. This opens the question if there is correlation between Age and Goals. (More on this later)
3. Which Squad converted more or less Goals with respect to Expected Goals ?
In this Question, We are asking who converted more/less goals from their xG.
Lets first describe the xG series
The first case, More Goals than the 75th percentile and More xG than the 75th percentile
We can see Liverpool, Manchester City and Manchester United performed well. They had more goals and more xG than 75% of the teams. That means they consist of the top 25%
Second case, Less Goals than the 75th percentile but More xG than the 75th percentile
Chelsea and Leeds United had more xG than the 3/4 of the Teams in the 2020/21 Season but scored less Goals than the top 25%
Third case, Less Goals than the Average but more xG than the Average
Brighton had more Expected goals than half the teams but they scored less goals than the Average.
BONUS
Using the tools at our disposal we can easily answer some key Questions that can help us probe further in our analysis by using Descriptive and Summary statistics.
It is just exciting on how much I can ask the Questions.
Lets play further with the data;
Which team used the least number of players and earned more goals than 75% of the teams ?
Let sit back and ask Python.
First, some descriptive stats,
Then lets ask the Question,
The Answer is,
This is a very interesting and addictive way to interact with Soccer data using Statistics and Python Programming. The rabbit hole led me to writing a bonus section LOL.
Thank you for you time,
Cheers ❤