Premier League 2020/21 - Data Analysis

Published on July 20, 2021 by Mariusz Borycki

Photo by Nathan Rogers on Unsplash

Football is the most famous sport with 4 billion fans around the world [1].

Premier League is the top level of the English football league system, contains 20 clubs. Currently (in 2021), it is the best league in the world [2].  

 

As a fan of footbal and in relations to above notes I have decided to analyse season 2020/21 in English Premier League.

 

Data source:

 

Context:

This dataset is a collection of basic but crucial statistics of the English Premier League 2020/21 season. The dataset has all the players that played in the EPL and their standard statistics such as Goals, Assists, xG (Expected Goals), xA (Expected Assists), Passes Attempted, Pass Accuracy and more!

 

Columns description:

Position: Each player has a certain position, in which he plays regularly. The position in this dataset are, FW - Forward, MF - Midfield, DF - Defensive, GK - Goalkeeper.
Starts: The number of times the player was named in the starting 11 by the manager.
Mins: The number of minutes played by the player.
Goals: The number of Goals scored by the player.
Assists: The number of times the player has assisted other player in scoring the goal.
Passes_Attempted: the number of passes attempted by the player.
PercPassesCompleted: The number of passes that the player accurately passed to his teammate.
xG: Expected number of goals from the player in a match.
xA: Expected number of assists from the player in a match.
Yellow_Cards: The players get a yellow card from the referee for indiscipline, technical fouls, or other minor fouls.
Red Cards: The players get a red card for accumulating 2 yellow cards in a single game, or for a major foul.

 

My objectives:

  • Number of yellow and red cards
  • Total number of cards within nationality
  • Number of cards per 1 player within nationality
  • Number of cards per 1 player within position
  • Amount of goals and assists
  • Scored goals - TOP 10 players
  • Assists - TOP 10 players
  • Canadian points - TOP 10 players
  • Goals by position on field
  • Assists by position on field
  • Amount of not scored goals - based on xG factor
  • Expected goal
  • Passes attempted - TOP 10 players
  • Accurate passes attempted 
  • The oldest players
  • The youngest players
  • Average age by position on field
  • Average age in EPL clubs
  • Amount of nationalities Premier League clubs

 

Dataset Content:

Before I started, I checked some informations about my dataset:

 

Mariusz Borycki - firs table info

 

As we can see above, my table contains 18 columns and 532 rows. 

 

 


YELLOW AND RED CARDS

The first chart shows how many yellow and red cards all the EPL clubs got:

 

countries = df_1.index.tolist()
red_cards = df_1['Red_Cards'].tolist()
yellow_cards = df_1['Yellow_Cards'].tolist()

width = 0.75

 

fig, ax = plt.subplots(figsize=(16, 10))

ax.bar(countries, yellow_cards, width, label='Yellow Cards', color='gold')
ax.bar(countries, red_cards, width, bottom=yellow_cards, label='Red Cards', color='orangered')

ax.set_ylabel('Amount of Cards', fontsize=14)
ax.set_title('Number of Yellow and Read Cards in Premier League (2020/21)', loc='left', fontsize=18, fontweight ='bold')
plt.xticks(countries, rotation=90, fontsize=12)

 

for index, data in enumerate(red_cards):
    plt.text(x=index , y=data + yellow_cards[index] + 1 , s=f"{data}" , fontdict=dict(fontsize=14), horizontalalignment='center')

 

for index, data in enumerate(yellow_cards):
    plt.text(x=index , y=20 , s=f"{data}" , fontdict=dict(fontsize=14), horizontalalignment='center')
    
ax.legend(fontsize=12, frameon=False)

fig.text(0.9, -0.08, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.show()


Mariusz Borycki - amount of cards by Premier League clubs


The chart has been sorted descending by red cards. What is visible, that Brighton is a team with the biggest number of red cards in Premier League.
However, the most yellow cards and 4 red cards gained Sheffield United, and that was the biggest total number of cards in the entire league.

 

Another thing, I have checked was number of the cards per nationality, where we can see which nation gained the biggest number of cards:

 

df_2 = df[['Name','Nationality','Yellow_Cards','Red_Cards']].groupby('Nationality').agg({'Yellow_Cards':'sum', 'Red_Cards':'sum','Name':'count'}).sort_values(by=['Yellow_Cards', 'Red_Cards'], ascending=False).head(15)
df_2.rename(columns={'Name':'#_Players'}, inplace=True)

countries = df_2.index
red_cards = df_2['Red_Cards']
yellow_cards = df_2['Yellow_Cards']

fig, ax = plt.subplots(figsize =(16, 9))
ax.barh(countries, yellow_cards, color='gold')
ax.barh(countries, red_cards, left=yellow_cards, color='orangered')
 
# Remove axes splines
for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

ax.grid(b = True, color ='grey',
        linestyle ='-.', linewidth = 0.5,
        alpha = 0.2)
 
# Show top values
ax.invert_yaxis()
 
x_red = red_cards.tolist()
y_yellow = yellow_cards.tolist()

 

for index, data in enumerate(x_red):
    plt.text(x=data + y_yellow[index], y=index , s=f"{data}" , fontdict=dict(fontsize=14), verticalalignment='center')

 

for index, data in enumerate(y_yellow):
    plt.text(x=data /2 - 5, y=index, s=f"{data}" , fontdict=dict(fontsize=14), verticalalignment='center')
    

 

ax.set_title('Total number of cards (yellow + red) within nationality',
             loc ='left', size=15, fontweight ='bold' )
 
fig.text(0.9, 0, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.yticks(fontsize=12)


plt.show()

 

Mariusz Borycki - amount of cards nationality in Premier League 2020-21

 

As we can see, majority of the cards were taken by english players what should not be surpriesing, as I am analyzing English Premier League. In regards to this I have divided the total of the cards by amount of the players depends on theirs nationality. 

Then I will be able to see an average number of cards per 1 player from each nation, what is more accurate:

 

cards_per_nation = df[['Name', 'Nationality', 'Yellow_Cards', 'Red_Cards']].groupby('Nationality').agg({'Yellow_Cards':'sum', 'Red_Cards':'sum', 'Name':'count'})
cards_per_nation['Total_Cards'] = cards_per_nation.Yellow_Cards + cards_per_nation.Red_Cards
cards_per_nation.rename(columns={'Name':'#_Players'}, inplace=True)
cards_per_nation['Cards_per_Player'] = cards_per_nation['Total_Cards'] / cards_per_nation['#_Players']
cards_per_nation = cards_per_nation.sort_values(by=['Cards_per_Player', '#_Players'], ascending=False).reset_index().head(25)

countries = cards_per_nation['Nationality']
cards = cards_per_nation['Cards_per_Player']
 
fig, ax = plt.subplots(figsize =(16, 12))
ax.barh(countries, cards, color='mediumaquamarine')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

ax.grid(b = True, color ='grey',
        linestyle ='-', linewidth = 0.2,
        alpha = 0.2)
 
ax.invert_yaxis()

 
for i in ax.patches:
    plt.text(i.get_width()+0.2, i.get_y()+0.5,
             str(round((i.get_width()), 2)),
             fontsize=12,
             color='black') # fontweight ='bold',

ax.set_title('Number of cards (yellow + red) per 1 player within nationality',
             loc='left', size=15, fontweight='bold')
 
fig.text(0.9, 0, 'mariuszborycki.com', fontsize=14,
         color='grey', ha='right', va='bottom',
         alpha=0.7)

plt.yticks(fontsize=12)


plt.show()

 

Mariusz Borycki - amount of cards per player in Premier League 2020-21

 

Now you can see that the number of cards received per 1 player of a given nationality puts English players in 25th place. In the first place, however, we have players from Mali and North Macedonia. In the case of Mali, we have two players Moussa Djenepo from Southampton and Yves Bissouma from Brighton who received 14 cards, and this is the same result like North Macedonian player - Ezgjan Alioski from Leeds United.

 

The last topic around the cards I wanted to check is how many cards the players received depending on their position on the field:

 

cards_per_position = df[['Name','Position', 'Yellow_Cards', 'Red_Cards']].groupby('Position').agg({'Yellow_Cards':'sum', 'Red_Cards':'sum', 'Name':'count'}).reset_index()
cards_per_position['Total_Cards'] = cards_per_position.Yellow_Cards + cards_per_position.Red_Cards
cards_per_position.rename(columns={'Name':'#_Players'}, inplace=True)
cards_per_position['Cards_per_Position'] = cards_per_position['Total_Cards'] / cards_per_position['#_Players']
cards_per_position = cards_per_position.reset_index(drop=True).sort_values(by=['Cards_per_Position', '#_Players'],ascending=False)

positions = cards_per_position['Position']
cards = cards_per_position['Cards_per_Position']

fig, ax = plt.subplots(figsize =(16, 9))
ax.barh(positions, cards, color='paleturquoise')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)


# Add annotation to bars
for i in ax.patches:
    plt.text(i.get_width(), i.get_y()+0.4,
             str(round((i.get_width()), 2)),
             fontsize = 12,
             color ='black') # fontweight ='bold'

 

ax.set_title('Number of cards (yellow + red) per 1 player within position',
             loc ='left', size=15, fontweight ='bold' )
 
fig.text(0.9, -0.02, 'mariuszborycki.com', fontsize = 14,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.yticks(fontsize=12)
plt.xticks(fontsize=12)


plt.show()

 

Mariusz Borycki - amount of cards by position in Premier League 2020-21

 

What might look interesting is the fact that the midfielders got more cards (2,82 cards per game) than defenders (2,59 cards per game).

 

 


GOALS AND ASSISTS

Another area to "research" is amount of goals and assists in the Premier League:

 

df_goals = df.copy()
df_goals['xGoals'] = round(df_goals.xG * df_goals.Matches,0).astype(int)

df_goals['xG_diff'] =  (df_goals.Goals - df_goals.xGoals).astype(int)


df_goals = df_goals[['Name', 'Club', 'Nationality', 'Position', 'Goals', 'xGoals', 'xG_diff', 'Assists']]\
.groupby('Club').sum()
df_goals['Canadian_Points'] = df_goals.Goals + df_goals.Assists
df_goals = df_goals.sort_values(by='Canadian_Points', ascending=False) 
df_goals

goals = df_goals['Goals']
assists = df_goals['Assists']
clubs = df_goals['Club']
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.bar(clubs, goals, label="Goals", color='darkturquoise')
ax.bar(clubs, assists, bottom=goals, label="Assists", color='paleturquoise')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

 

x_goals = goals.tolist()
x_assists = assists.tolist()

 

for ia, da in enumerate(x_assists):
    plt.text(x=ia, y=da / 2 + x_goals[ia] + 1, s=f"{da}", fontdict=dict(fontsize=14), horizontalalignment='center', verticalalignment='center')

 

for ig, dg in enumerate(x_goals):
    plt.text(x=ig, y=dg / 2, s=f"{dg}", fontdict=dict(fontsize=14), horizontalalignment='center', verticalalignment='center')

 

ax.set_title('Amount of Goals and Assists in Premier League - sorted by canadian points',
             loc='left', size=15, fontweight='bold')

fig.text(0.9, -0.1, 'mariuszborycki.com', fontsize = 12,
         color ='black', ha ='right', va ='top',
         alpha = 0.5)

ax.legend(fontsize=12, frameon=False)

plt.tick_params(left = False, right = False, labelleft=False, labelbottom=True, bottom=True)
plt.xticks(clubs,rotation=90,fontsize=12)

 

plt.show()

 

Mariusz Borycki - amount of scored goals in Premier League 2020-21

 

Most goals and assists in the last season was gained by Manchester City. Manchester United and Tottenham Hotspur came further on the podium.

 

It's worth to see which player scored most goals and who had the biggest number of assists in EPL. 

 

TOP 10 scored goals:

 

df[['Name','Club','Goals']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Goals'],ascending=False).head(10).plot(x='Name', kind='bar', figsize=(16,8), width=0.8, color='lightseagreen', edgecolor='forestgreen', grid=True)
plt.grid(color='lightgrey', linestyle='-', linewidth=0.3, alpha=0.5)

plt.title('Scored Goals - Top 10 players',loc ='left', size=15, fontweight ='bold')


for index, data in enumerate(df[['Name','Club','Goals']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Goals'],ascending=False).head(10)['Goals']):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=16), horizontalalignment='center', color='w')


plt.xticks(rotation=45, fontsize=14)
plt.yticks(fontsize=12)
plt.xlabel(None)
plt.legend([], frameon=False)

plt.text(0.8, -0.153, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=12)

 

plt.show()

 

Mariusz Borycki - TOP10 scored goals in Premier League 2020-21

 

Tottenham player Harry Kane scored the most goals in the last season in EPL, scoring 23 goals. Mohamed Salah from Liverpool was just behind him with 22 goals. On the third place with 18 goals was Bruno Fernandes, who plays for Manchester United.

 

TOP 10 players with the most assists in the league:

 

df[['Name','Club','Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10).plot(x='Name', kind='bar', figsize=(16,8), width=0.8, color='lightseagreen', edgecolor='forestgreen', grid=True)
plt.grid(color='lightgrey', linestyle='-', linewidth=0.3, alpha=0.5)

plt.title('Assists - Top 10 players',loc ='left', size=15, fontweight ='bold')


for index, data in enumerate(df[['Name','Club','Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Assists'],ascending=False).head(10)['Assists']):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=16), horizontalalignment='center', color='w')

 

plt.xticks(rotation=45, fontsize=14)
plt.yticks(fontsize=12)
plt.xlabel(None)
plt.legend([], frameon=False)

plt.text(0.8, -0.15, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=12)

 

plt.show()

 

Mariusz Borycki - TOP10 Assists in Premier League 2020-21

 

Interestingly, the same player who is at the top of the podium among the Premier League's top scorers - Harry Kane was in the lead in here as well with 14 assists per season. Two players from Manchester are in the second place. Namely, Bruno Fernandes who plays for the "Red Devils" and Kevin De Bruyne from "The Citizens".

 

TOP 10 in canadian points:

 

Another indicator related to the number of goals and assists is a so-called "canadian points". This is a rarely used statistic in footbal, which shows the sum of goals and assists at a selected time (2020/21 season in our case).

The chart below shows who had the most goals and assists in the previous season:

 

df_canadian = df[['Name', 'Club', 'Position','Goals', 'Assists']].copy()
df_canadian['Cacadian_Points'] = df_canadian.Goals + df_canadian.Assists
df_canadian = df_canadian[['Name','Club','Cacadian_Points','Goals', 'Assists']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Cacadian_Points'],ascending=False).head(10)

name = df_canadian['Name']
points = df_canadian['Cacadian_Points']
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.bar(name, points, color='lightseagreen')

 

for index, data in enumerate(df_canadian['Cacadian_Points']):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=16), horizontalalignment='center', color='w')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)


    
ax.grid(b = True, color ='grey',
        linestyle ='-', linewidth = 0.3,
        alpha = 0.2)
 
ax.invert_yaxis()
 
ax.set_title('Canadian Points - Top 10 players',
             loc ='left', size=15, fontweight ='bold')
 
fig.text(0.9, 0.15, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.xticks(name,rotation=45,fontsize=14)
plt.yticks(fontsize=12)

 

plt.show()

 

Mariusz Borycki - TOP10 Canadian Points in Premier League 2020-21

 

When it comes to the canadian scoring, of course Harry Kane is at the forefront of the classification. On the other side is Kevin De Bruyne which closes the TOP 10.

 

Ratio of scored goals to position on a field: 

Being curious about a relation between the position on the pitch and the number of scored goals, I checked whether the most goals are scored by the nominal forwards:

 

# variables
my_labels = df[['Position','Goals']].groupby(['Position']).sum().sort_values(by=['Goals'],ascending=False)\
        .head(10).index.tolist()
my_colors = ['turquoise','lightseagreen','paleturquoise','darkturquoise']
myexplode = [0.2, 0, 0, 0]

 

# plot
df[['Position','Goals']].groupby(['Position']).sum().sort_values(by=['Goals'],ascending=False).head(10)\
        .plot(x='Name',kind='pie', figsize=(16,8), subplots=True, labels=my_labels,startangle=15, 
        shadow=True, colors=my_colors, explode=myexplode, autopct='%1.2f%%', fontsize=16)

plt.title('Goals by position', loc ='left', size=16, fontweight ='bold')
plt.ylabel(None)
plt.axis('equal')

plt.legend([],frameon=False)

plt.text(0.7, -0.1, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=12)

 

plt.show()

 

Mariusz Borycki - Goals by position in Premier League 2020-21

 

So, there is no major surprises and almost 3 goals out of 5 were scored by the forwards.

 

Ratio of assists gained to position on a field: 

Below, similarly to the pie chart above, I checked the position on the pitch where is the biggest possibility to get assist: 

 

df_assists = df[['Position','Assists']].groupby(['Position']).sum().sort_values(by=['Assists'], ascending=False)
my_colors = ['turquoise','lightseagreen','paleturquoise','darkturquoise']
myexplode = [0.2, 0, 0, 0]

fig, ax = plt.subplots(figsize=(16, 9))

assists = df_assists.Assists.tolist()
possition = df_assists.index.tolist()

 

def func(xx, yy):
    absolute = int(round(xx/100.*np.sum(yy)))
    return "{:.1f}%\n({:d} Assists)".format(xx, absolute)

 

ax.pie(assists, autopct=lambda x: func(x, assists), 
       textprops={'color':"black",'size':16}, # 'fontweight':'bold' 
       labels=possition, startangle=15, shadow=True, colors=my_colors, explode=myexplode)


ax.set_title("Assists by position", loc ='left', size=16, fontweight ='bold')

plt.text(0.9, -0.01, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=12)

 

plt.show()

 

Mariusz Borycki - Assists by position in Premier League 2020-21

 

I really thought midfielders and even defenders will have more assists than forwards, but in fact it was a close contest. Although, the forwards are still in the lead.

 

Expected goals (xG):

A quite important statistic, and not so widely known, is a factor called "xG". The term in football is an abbreviation which stands for "expected goals". It is a statistical measurement of the quality of goalscoring chances and the likelihood of them being scored.

 

From this indicator, I can see which club is at the forefront of unused occasions to score a goal:

 

df_goals = df_goals.sort_values(by='xG_diff', ascending=False) 

goals = round(df_goals['xG_diff'], 1)
clubs = df_goals.index
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.bar(clubs, goals, label="Goals", color='darkturquoise')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

 

x_goals = goals.tolist()

 

for ig, dg in enumerate(x_goals):
    plt.text(x=ig, y=dg / 2, s=f"{dg}", fontdict=dict(fontsize=12), horizontalalignment='center', verticalalignment='center')

 

ax.set_title('Amount of not scored goals in Premier League (scored goals - expected goals)',
             loc ='left', size=15, fontweight ='bold')

fig.text(0.9, -0.1, 'mariuszborycki.com', fontsize = 12,
         color ='black', ha ='right', va ='top',
         alpha = 0.5)

ax.legend([], frameon=False)

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)
plt.xticks(clubs,rotation=90,fontsize=12)

 

plt.show()

 

Mariusz Borycki - NOT scored goals in Premier League 2020-21

 

As can be seen in the chart above, Crystal Palace is the master in taking their chances to score a goal. Chelsea London looks interesting in this ranking, taking 4th place in the table at the end of the season, despite the fact that they did not use about 35 opportunities to score a goal.

 

Below is a table with the xG values for all Premier League clubs:

 

Mariusz Borycki - expected goals by club

 

Chelsea London players scored 56 goals in the last season. Based on their xG indicator calculated they should score around 91 goals.

 

Expected Goals - negative xG ratio:

I also checked the TOP 10 players with a positive goal-to-xG ratio and the TOP 10 players with a negative goal-to-xG ratio.

 

First, I verified the list of the ten players which wasted their really good chances to score a goal:

 

df_goals = df.copy()
df_goals['xGoals'] = round(df_goals.xG * df_goals.Matches,0).astype(int)

df_goals['xG_diff'] =  (df_goals.Goals - df_goals.xGoals).astype(int)


df_goals = df_goals[['Name', 'Club', 'Nationality', 'Position', 'Goals', 'xGoals', 'xG_diff', 'Assists']]\
.groupby(['Name']).sum()
df_goals['Canadian_Points'] = df_goals.Goals + df_goals.Assists
df_goals = df_goals.sort_values(by='xG_diff', ascending=True)

goals = df_goals.sort_values(by='xG_diff',ascending=True).head(10)['xG_diff']
names = df_goals.sort_values(by='xG_diff',ascending=True).head(10).index
 
fig, ax = plt.subplots(figsize =(16, 9))
ax.bar(names, goals, label="Goals", color='turquoise')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

 

x_goals = goals.tolist()

 

for ig, dg in enumerate(x_goals):
    plt.text(x=ig, y=dg / 2, s=f"{dg}", fontdict=dict(fontsize=12), horizontalalignment='center', verticalalignment='center')

 

ax.set_title('Amount of not scored goals in Premier League (scored goals - expected goals)',
             loc ='left', size=15, fontweight ='bold')

fig.text(0.9, -0.1, 'mariuszborycki.com', fontsize=12,
         color='black', ha='right', va='top',
         alpha = 0.5)

ax.legend([], frameon=False)

plt.tick_params(left=False, right=False, labelleft=False, labelbottom=True, bottom=True)
plt.xticks(names, rotation=90, fontsize=12)

 

plt.show()

 

Mariusz Borycki - LOW 10 for expected goals in Premier League 2020-21

 

Fabio Silva missed 9 clear situations to score a goal. In the second place with eight unused situations we have three players. Namely, they are Aleksandar Mitrovic from Fulham FC, Matej Vydra from Burnley FC and Timo Werner from Chelsea FC.

 

Expected Goals - positive xG ratio:

Then, on the same principle, I wanted to see who was on the list of the top ten players who were able to score a goal despite unfavorable chances they had:

 

goals = df_goals.sort_values(by='xG_diff',ascending=False).head(10)['xG_diff']
names = df_goals.sort_values(by='xG_diff',ascending=False).head(10).index

fig, ax = plt.subplots(figsize =(16, 9))
ax.bar(names, goals, label="Goals", color='turquoise')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

 

x_goals = goals.tolist()

 

for ig, dg in enumerate(x_goals):
    plt.text(x=ig, y=dg / 2, s=f"{dg}", fontdict=dict(fontsize=12), horizontalalignment='center', verticalalignment='center')

 

ax.set_title('Expected Goals - TOP10 (scored goals - expected goals)',
             loc ='left', size=15, fontweight ='bold')

fig.text(0.9, -0.1, 'mariuszborycki.com', fontsize = 12,
         color ='black', ha ='right', va ='top',
         alpha = 0.5)

ax.legend([], frameon=False)

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)
plt.xticks(names,rotation=90,fontsize=12)

 

plt.show()

 

Mariusz Borycki - TOP 10 expected goals in Premier League 2020-21

 

On the first place is Tottenham Hotspur player Heung-min Son, scored 6 goals from a difficult position with a negative xG factor.

 

 


PASSES ATTEMPTED

 

Another area that I verified was the TOP 10 players with the biggest amount of passes in the English League last season:

 

df[['Name','Club','Passes_Attempted']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Passes_Attempted'],ascending=False).head(10).plot(x='Name', kind='bar', figsize=(16,8), width=0.8, color='turquoise', edgecolor='forestgreen', grid=True)

passes = df[['Name','Club','Passes_Attempted']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Passes_Attempted'], ascending=False).head(10)['Passes_Attempted'].to_list()


for index, data in enumerate(passes):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=14), horizontalalignment='center')

 

plt.grid(color='lightgrey', linestyle='-', linewidth=0.7, alpha=0.2)
plt.xlabel(None)
plt.xticks(rotation=45, size=14)
plt.title('Passes Attempted - Top 10 players', loc='left', size=15, fontweight='bold')

plt.legend([], frameon=False)

plt.text(0.8, -0.15, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=14)

 

plt.show()

 

Mariusz Borycki - Passes Attempted - TOP10 players in Premier League 2020-21

 

The chart above shows that the stake was quite even. Here we can distinguish Andrew Robertson, who is at the forefront of the league and gained a slightly greater advantage over his colleagues, with 3,214 passes per season.

 

I also compiled a list of players with the most accurate passes. What is quite important, I have only considered players who have more than 1000 passes per season. My assumption is based on my intention to exclude players who played not too much:

 

df.loc[df.Passes_Attempted>1000,['Name', 'Club', 'Perc_Passes_Completed']].groupby(['Name', 'Club']).sum().reset_index().sort_values(by=['Perc_Passes_Completed'],ascending=False).head(15).plot(x='Name', kind='bar', figsize=(18,8), color='turquoise', grid=False, width=0.90)

plt.title('Accurate Passes Attempted (%) - Top 15 players', loc='left', size=15, fontweight='bold')

passes_accur = df.loc[df.Passes_Attempted>1000,['Name','Club', 'Perc_Passes_Completed']].groupby(['Name','Club']).sum().reset_index().sort_values(by=['Perc_Passes_Completed'], ascending=False).head(15)['Perc_Passes_Completed'].tolist()


for index, data in enumerate(passes_accur):
    plt.text(x=index, y=data -1, s=f"{data}%", fontdict=dict(fontsize=14), horizontalalignment='center', color='w',fontweight ='bold')

 

plt.xticks(rotation=45, size=14)
plt.yticks(size=14)
plt.xlabel(None)
plt.ylim(86, 96)
plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)

plt.legend([], frameon=False)

plt.text(0.88,-0.15, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=14)

 

plt.show()

 

Mariusz Borycki - Accuracy of Passes Attempted - TOP15 players in Premier League 2020-21

 

 


PLAYERS AGE

Another area I checked was the age of the players. To begin with, I have prepared a list of the 15 oldest players in the league:

 

df_age_high = df[['Name', 'Club', 'Position', 'Age']].sort_values(by=['Age'], ascending=False).head(15)

name = df_age_high['Name']
age = df_age_high['Age']

fig, ax = plt.subplots(figsize=(16,10))
ax.bar(name,age,color='lightseagreen')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)

    
for index, data in enumerate(age):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=14), horizontalalignment='center', color='w', fontsize=16)

 

ax.set_title('Top 15 the oldest players',
             loc ='left', size=15, fontweight ='bold' )
 
fig.text(0.9, -0.05, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)

ax.invert_yaxis()
plt.yticks(fontsize=12)
plt.xticks(fontsize=14, rotation=45)


plt.show()

 

Mariusz Borycki - the oldest players in Premier League 2020-21

 

In the season 2020/21 in Premier League, we had 9 players who turned 35. The eldest of them is Argentine goalkeeper Willy Caballero from Chelsea FC, who was 38 years old once the season began in 2020.

 

List of 15 the youngest players in the league:

 

df_age_low = df[['Name', 'Club', 'Position', 'Age']].sort_values(by=['Age'], ascending=True).head(15)

name = df_age_low['Name']
age = df_age_low['Age']

fig, ax = plt.subplots(figsize=(16,10))
ax.bar(name,age,color='lightseagreen')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)


for index, data in enumerate(age):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=14), horizontalalignment='center', color='w', fontsize=16)

 

ax.set_title('Top 15 the youngest players',
             loc ='left', size=15, fontweight ='bold' )
 
fig.text(0.9, -0.05, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)

ax.invert_yaxis()
plt.yticks(fontsize=12)
plt.xticks(fontsize=14, rotation=45)


plt.show()

 

Mariusz Borycki - the youngest players in Premier League 2020-21

 

Last season, there were 12 players under 18 in the English Premier League clubs.

 

Average age by position:

 

df_position = df[['Position','Age']].groupby(['Position']).mean().round(1).sort_values(by='Age')

fig, ax = plt.subplots(figsize=(16, 10))
age = df_position['Age']
position = df_position.index
ax.bar(position,age,color='lightseagreen')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)


for index, data in enumerate(age):
    plt.text(x=index, y=data * 0.9, s=f"{data}", fontdict=dict(fontsize=14), horizontalalignment='center', color='w', fontsize=16)

 

ax.legend([], frameon=False)
ax.set_title("Average age by positions in Premier League", size=16, loc='left', fontweight='bold')

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)
 
fig.text(0.9, -0.05, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.yticks(fontsize=12)
plt.xticks(fontsize=14, rotation=45)

 

plt.show()

 

Mariusz Borycki - Avg age by position in Premier League 2020-21

 

There is no major surprises on the chart above. The age of the players is fairly even, apart from the goalkeeper position (28.2 years), where the average age is 3 years higher than for midfielders (25.1 years).

 

Age of players for each Premier League team:

 

df_club_ages = df[['Club','Age']].groupby('Club').mean().round(1).sort_values(by=['Age'], ascending=True)

fig, ax = plt.subplots(figsize=(16, 10))
age = df_club_ages['Age']
club = df_club_ages.index
ax.bar(club,age,color='lightseagreen')

 

for s in ['top', 'bottom', 'left', 'right']:
    ax.spines[s].set_visible(False)


for index, data in enumerate(age):
    plt.text(x=index, y=data * 0.9, s=f"{data}", horizontalalignment='center', color='w', fontsize=14)

 

ax.legend([], frameon=False)
ax.set_title("Average age in all clubs in Premier League",
             loc ='left', size=15, fontweight ='bold')

plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)
 
fig.text(0.9, -0.09, 'mariuszborycki.com', fontsize = 12,
         color ='grey', ha ='right', va ='bottom',
         alpha = 0.7)

plt.yticks(fontsize=12)
plt.xticks(fontsize=14, rotation=90)

 

plt.show()

 

Mariusz Borycki - average age in al EPL clubs in Premier League 2020-21

 

Crystal Palace is the oldest team in the EPL with players on average 4 years older than the youngest team in the league - Manchester United.

 

Age range in all clubs in Premier League:

 

plt.figure(figsize=(20,10))
sns.boxplot(x='Club', y='Age', data=df.sort_values(by='Age'))
plt.yticks(fontsize=12)
plt.xticks(fontsize=14, rotation=90)
plt.xlabel(None)
plt.ylabel('Age', size=14)

plt.title("Age range in all clubs in Premier League",
             loc='left', size=18, fontweight='bold')

plt.text(1.05, -0.15, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=14)

plt.show()

 

Mariusz Borycki - age range in EPL clubs in Premier League 2020-21

 

In the above so-called "box plot", you can see the detailed age range from youngest to oldest players for each Premier League club during the season.

The example of Tottenham shows that the youngest player is only 16 years old. The next player in order is 5 years older and he has 21 years old.

The oldest player from "Spurs"  is 33 years old. Whereas, the median age in this club varies between 25 and 26 years of age.

 

Nationality in Premier League:

 

df.loc[df.Nationality!='ENG',['Club', 'Nationality']].drop_duplicates().groupby('Club').count().reset_index().sort_values(by= 'Nationality').plot(x='Club', kind='bar', figsize=(18,10), color='darkturquoise', grid=False, width=0.85)


plt.title('Amount of nations in the Premier League clubs (excluding ENG)', loc='left', size=18, fontweight ='bold')

nationalities = df.loc[df.Nationality!='ENG', ['Club', 'Nationality']].drop_duplicates().groupby('Club').count().sort_values(by='Nationality')['Nationality'].tolist()


for index, data in enumerate(nationalities):
    plt.text(x=index, y=data -1, s=f"{data}", fontdict=dict(fontsize=18), horizontalalignment='center', color='w')

 

plt.xticks(rotation=90, size=14)
plt.yticks(size=14)
plt.xlabel(None)
plt.tick_params(left = False, right = False, labelleft = False, labelbottom = True, bottom = True)

plt.legend([], frameon=False)


plt.text(0.88, -0.05, 'mariuszborycki.com', horizontalalignment='center',
         verticalalignment='center', transform=ax.transAxes, color='grey', fontsize=14)

 

plt.show()

 

Mariusz Borycki - Natianality in EPL clubs in Premier League 2020-21

 

The last chart I prepared is a numerical value indicating how many different nationalities were in each Premier League club, excluding English ancestry. What may seem surprising to someone is that one club, such as Liverpool or Fulham, was hiring even 16 players from different countries.

 

In case of "The Reds" we are talking about a club that took 3rd place in the league and apparently this multiculturalism could have had a positive impact on the results achieved. However, it does not looked the same in case of the team from London - Fulham F.C., which took 18th place with a big loss to the next team in the table - Burnley (11 points), and finally Fulham was relegated from the league.

 

 


IN CONCLUSION

If you have any questions or free conclusions regarding my analysis, feel free to leave a comment.

However, if you have ideas for further projects or if you have any other questions, please do not hesitate to contact me.

 

All scripts and their descriptions can be found in my repository on the GitHub platform and in Kaggle, where I cordially invite you.

Comments:

0 comments

There is no comment yet.

Add new comment: