Covid vs Flu: Covid-19 impact on the Influenza in Europe

Published on June 8, 2021 by Mariusz Borycki

Photo by Fusion Medical Animation on Unsplash

 

One of the most common reports analyzed in the last year 2020 is the one showing the number of Covid19 infections. While, I would like to focus on it from a slightly different side. Namely, it is said that if there is anything positive about the pandemic from 2020/2021, it will be defeating the influenza virus. With the short analysis below, I wanted to check whether the coronavirus really contributed to a significant reduction or complete defeat of the flu both in Europe and in Poland.


Data source:

Data with number of Covid19 and Influenza detected cases were obtained from two different sources:

 

Influenza:

To download data for flu, I went into the website above and filter the report shown on it by the period and countries. Next, I was able to download the spreadsheet file with the chosen data.

 

Mariusz Borycki - flu_report

Flu Virus Characteristics - https://flunewseurope.org/VirusCharacteristics

 

Data Transformation / Data Cleaning:

I converted the downloaded excel file into .csv format and moved it to my GitHub repository. The tool I use most often for the data analysis process is Jupyter Notebook.

At the beginning, I downloaded all the necessary python libraries that I will use and the first file containing the number of cases of influenza virus, which I named "flu_detected".

 

import pandas as pd
import os
from pathlib import Path
from bs4 import BeautifulSoup
import requests
import re
import io

url_flu = "https://raw.githubusercontent.com/mborycki/Covid_Influenza_Comparison/main/Influenza_virus_detections_in_Europe.csv"

flu_detected = pd.read_csv(url_flu)
flu_detected.head()

 

Mariusz Borycki - flu_detected_eng

 

As you can see, the table presents 11 columns. At first, I saw what the "Region" column contains, and there are two values:

  • EU/EEA - i.e. countries that are part of the European Economic Area
  • WHO Europe - all European countries


I think there are duplicates in this table, because if a country belongs to the European Economic Area, the data for this country will be both in "EU / EEA" and in "WHO Europe". However, it is worth to check on the example country. Let it be Poland, which is part of the EEA:

 

Mariusz Borycki - flu_detected_analasis


As you can see, we have the same values ​​for Poland in both "EEA" and "WHO Europe". Therefore, I have only left the values ​​assigned to "WHO Europe", which includes all of the "EEA" countries and those that are not part of a Free-Trade Zone.


I looked at a few details:

 

Mariusz Borycki - flu_detected-info

 

As you can see, there are 9 763 lines and 11 columns. We also see the data types (integer "int64" or text "object") and we know that the table does not contain blank values ​​as each column has the same number of rows.


Next, I have planned a few tasks to do based on the table we have opened:

  • Divide the "Week" column into two separate columns - "Year" and "Weeks"
  • Check whether I need the "Surveillance System Type" column
  • Consider whether I need the "Season" and "Region" columns. If not, delete them.
  • See (out of curiosity) what is the ratio of each type of influenza in relation to the total of reported cases
  • Sum up all flu types as we need only the total of flu cases
  • Create a pivot table that will facilitate analysis

 

I started by changing the column name from "Week" to "YearWeek" as the values ​​in that column basicaly shows the year and week (see table above):

flu_detected.rename(columns={'Week':'YearWeek'},inplace=True)


Then, I checked the unique values ​​in the "Surveillance System Type" columns, which had only one variable: "Non-sentinel", and the "Season" column, which in turn had the following values:

flu_detected['Surveillance System Type'].unique()

out: ['Non-sentinel']

 

flu_detected['Season'].unique()

out: ['2015/2016', '2016/2017', '2017/2018', '2018/2019', '2019/2020', '2020/2021']

 

I decided to remove both the "Surveillance System Type" and the "Season" columns. The first one contains only one value that does not matter to me, the second one is basically a slimmed-down version of the "YearWeek" column:

flu_detected = flu_detected.drop(['Season', 'Region', 'Surveillance System Type'], axis = 1)

 

The next step is to summarize all the flu types:

 

Mariusz Borycki - flu_detected-total_cases

 

Now it's time to create a pivot table that will improve the readability of the above data:

 

mariusz-borycki-flu_detected-pivot_table

 

I think the table looks much better. All flu type names are currently in the "Flu Type" column, the values ​​are in the "Detected_Cases" column and the "YearWeek" column has been split into two separate columns named "Year" and "Week" .


While I was focusing on the total detected influenza cases, I also wanted to see the ratio for each of the flu type. Before doing that, I decided to shorten the text for each type of flu a bit, to make the chart easier to read:

original_type_names = ['A not subtyped', 'A (H1) pdm09', 'A (H3)', 'B lineage not determined', 'B / Vic', 'B / Yam']

new_type_names = ['A', 'A (H1)', 'A (H3)', 'B', 'B / Vic', 'B / Yam']


for o, n in zip (original_type_names, new_type_names):
    flu_detected2.loc[(flu_detected2['Flu Type'] == o), 'Flu Type'] = n

 


Now the values ​​in the "Flu Type" column look like this:

flu_detected2['Flu Type'].unique()

out: ['A', 'A (H1)', 'A (H3)', 'B', 'B / Vic', 'B / Yam', 'Total Detected Cases']

 

I would like to check if we have enough data for each year. To do this, I have checked what is the total of detected flu cases and the number of weeks included in the given year:

flu_detected2[(flu_detected2['Flu Type'] == 'Total Detected Cases')].sort_values(['Year']).groupby(['Flu Type', 'Year'])['Detected_Cases'].sum().reset_index()

 

Mariusz Borycki - flu_detected_cases

 

years = flu_detected2.Year.unique()

for year in years:
    print (f'In {year} we have {len(flu_detected2[flu_detected2.Year == year].Week.unique())} weeks')

 

Mariusz Borycki - flu_detected_weeks

 

As you can see in the example above, 2015 is incomplete (we have data for 14 weeks). Of course, 2021 is not looking much better. However, this is due to the fact that I downloaded the data on the end of May 2021. In that case, I removed only 2015 from the table:

years_list = ['2016','2017','2018','2019','2020', '2021'] 

flu_detected2 = flu_detected2[flu_detected2['Year'].isin(years_list)]

 


In the below script I was checking what was the ratio of flu types in 2016-2021:

df1 = flu_detected2[flu_detected2['Flu Type']!='Total Detected Cases'].groupby(['Flu Type', 'Year'])['Detected_Cases'].sum().reset_index()

df2 = df1.pivot(index="Year", columns="Flu Type", values="Detected_Cases").reset_index().set_index('Year')

 

fig, ax1 = plt.subplots()
plt.rcParams["figure.figsize"] = (25,15)
plt.xticks(fontsize=16, rotation=45)
plt.grid(color='grey', linestyle = '--', linewidth = 0.5)

 

width = 0.8
bottom = 0

 

for i in df2.columns:
    plt.bar(df2.index, df2[i], width=width, bottom=bottom)
    bottom += df2[i]

 

plt.title(f"Influenza Cases per year in Europe", fontsize=28)
plt.xlabel('Years', fontsize=24)
plt.ylabel("Detected Flu Cases", color='black', fontsize=24)
plt.tick_params(axis='y', labelcolor='black', labelsize=16) 
plt.legend(df2.columns, fontsize=16)
plt.tight_layout()

 

 

Mariusz Borycki - flu_cases-bar_chart_Europe


As you can see in the chart above, 2021 seems to be flu virus free. As for the remaining years, almost every year the most common type of influenza virus is type A, which occurs in both humans and animals (pigs, horses, seals, minks, whales and birds), and type B, which occurs only in humans. You can find a lot of information about the definition for each type of flu, among others on Wikipedia. I encourage you to check it out.

 

I didn't need the data for each flu type anymore, so I made another table with the total flu cases. I called the table "df_flu":

df_flu = flu_detected2[(flu_detected2['Flu Type']=='Total Detected Cases')].sort_values(['Year']).groupby(['Country','Year', 'Week'])['Detected_Cases'].sum().reset_index()

 

Here I saw the total value of flu cases in 2016-2021:

xs = df_flu.groupby('Year')['Detected_Cases'].sum().reset_index()['Year']
ys = df_flu.groupby('Year')['Detected_Cases'].sum().reset_index().Detected_Cases.values

 

plt.rcParams["figure.figsize"] = (15,10)
plt.plot(xs,ys,'bo-')
plt.title(f'Total influenza cases for all countries', fontsize=24)

 

for x,y in zip(xs,ys):
    label = f'{y:,}'
    plt.annotate(label, (x,y), textcoords="offset points", xytext=(0,10), ha='center') 
plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)

 

plt.show()

 

 

Mariusz Borycki - flu_cases-line_chart_Europe

 

At first glance, you can see a decrease in flu cases in 2019-2021 (May). It is true that the data for May is incomplete, but (as far as I remember) most cases occur in the first quarter of each year. However, I checked it myself a bit later.

 

Finally, we will remove the letter "W" from the "Week" column, then we will have a column with integer values representing the week number - e.g.: "1" instead of "W01":

df_flu['Week'] = df_flu['Week'].map(lambda x: x.lstrip ('W'))


df_flu['Week'] = df_flu['Week'].astype('int')


df_flu['Year'] = df_flu['Year'].astype('int')

 

Done, one last look at the final table with data showing the number of flu cases in Europe:

Mariusz Borycki - flu_detected_sorted


It's time for the second report which shows what was the number of infections with the SARS-COV-2 virus in Europe. Finally, the second table will have the same form as the one above (for flu), so that I could combine them and make a comparison.

 


COVID-19:

 

All the data with COVID-19 I used, were taken from repository of the Center for Science and Systems Engineering (CSSE) at Johns Hopkins University.
Website address: https://github.com/CSSEGISandData/COVID-19


On the basis of these data there was created a very popular graph wich presents the current situation related to the pandemic.

 

To download all of the reports from the above repository, I used the "Beautiful Soup" library. Links for all the reports has been saved in a variable "urls" and the dates for each file were saved in "df_list_names":

url = "https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_daily_reports"
r = requests.get(url)

html_doc = r.text
soup = BeautifulSoup(html_doc)
a_tags = soup.find_all('a')

urls = ['https://raw.githubusercontent.com'+re.sub('/blob', '', link.get('href'))
        for link in a_tags if '.csv' in link.get('href')]

df_list_names = [url.split('.csv')[0].split('/')[url.count('/')] for url in urls]

 

A look at the "urls" and "df_list_names" variables:

urls[:2]

out:

['https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-01-2021.csv',
'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_daily_reports/01-02-2021.csv']

 

df_list_names[:5]

out: ['01 -01-2021 ', '01 -02-2021', '01 -03-2021 ', '01 -04-2021', '01 -05-2021 ']

 

The downloaded files from the repository contain the following columns:

FIPS, Admin2, Province_State, Country_Region, Last Update, Lat and Long_, Confirmed, Deaths, Recovered, Active, Incident_Rate, Case_Fatality_Ratio (%)

 

However, I decided that I would need only the following columns:

Country_Region, Last_Update, Confirmed, Deaths, Recovered.

 

In addition, I also downloaded two columns which might be needed: Lat, Long_

 

After downloading the data, it turned out that the names of the columns changed slightly over the year. Due to this, I had to add conditional statements to the code:

col_names1 = ['Country_Region','Last_Update','Lat','Long_','Confirmed','Deaths','Recovered']
col_names2 = ['Country/Region','Last Update','Latitude','Longitude','Confirmed','Deaths','Recovered']
col_names3 = ['Country/Region','Last Update','Confirmed','Deaths','Recovered']

for count, url in enumerate(urls):
    download = requests.get(url).content
    df = pd.read_csv(io.StringIO(download.decode('utf-8')))


    if (df.shape[1] == 14) | (df.shape[1] == 12):
        df = df[col_names1]


    elif df.shape[1] == 8:
        df = df[col_names2]


    elif df.shape[1] == 6:
        df = df[col_names3]
        df['Lat'] = 0
        df['Long_'] = 0
        df = df[['Country/Region','Last Update','Lat','Long_','Confirmed','Deaths','Recovered']]


    else:
        print(f'We have {df.shape[1]} columns in {url} file')
    
    df['File_Name'] = df_list_names[count]        


    try:
        df.columns = cols 
        covid_table = covid_table.append(df, ignore_index=True)


    except:
        pass

 

A glance at the downloaded data:

 

Mariusz Borycki - covid-head

 

 

Data Transformation / Data Cleaning

The next step will be to change the data type for the two columns "Last_Update" and "File_Name"

 

Mariusz Borycki - covid_to-datatime

 

As you can see above, we have a few columns with no values, which I'm going to fix:

df_covid.fillna({'Deaths': 0, 'Confirmed': 0, 'Recovered': 0}, inplace = True)

covid_cases_list = ['Confirmed','Deaths','Recovered']


for case in covid_cases_list:
    df_covid[case] = df_covid[case].astype(float)

 

The next step is to summarize the number of cases: Confirmed / Deaths / Recovered by country and column "Last_Update":

df_covid = df_covid.groupby(['Country_Region', 'Last_Update']).agg({'Confirmed': 'sum', 'Deaths': 'sum', 'Recovered': 'sum'}).reset_index()

 

The "Last update" column shows the date for the data. However, to be able to connect and compare both tables for Covid and Influenza with each other, I needed a week number for each day. To create the columns with week number I used the "isocalendar" function:

week_no = []
year_no = []


for value in df_covid['Last_Update']:
    week_no.append(value.isocalendar()[1])
    year_no.append(value.isocalendar()[0])

 

df_covid['Week'] = week_no
df_covid['Year'] = year_no

 

Let's see what the example table for Poland looks like now. I chose week 20 of 2021:

df_covid[(df_covid.Country_Region == 'Poland')&(df_covid.Week == 20)&(df_covid.Year == 2021)]

 

Mariusz Borycki - covid-table

 

I took the maximum values ​​from the table above because I needed weekly level data:

df_covid = df_covid.groupby(['Country_Region', 'Week', 'Year'], sort = False).agg({'Confirmed': 'max', 'Deaths':'max','Recovered':'max'}).reset_index()

 

I also changed the column name for the region:

df_covid.rename(columns = {'Country_Region': 'Country'}, inplace = True)

 

Mariusz Borycki - covid-head2

 

Everything seems to be fine now. However, one more issue puzzles me. Namely, do I have the same country names in both tables or maybe there are some differences:

 

Mariusz Borycki - covid_flu-country_names

 

The table with the flu is my main table, because we need only European countries. As you can see, we have a few missing values ​​in the table "df_covid" columns "cov". We can see below the countries names where we had difference in both tables:

 

Mariusz Borycki - covid_flu-missing_countries_print

 

I made a list with the countries to change and I have refreshed the table:

 

Mariusz Borycki - covid_flu-missing_countries_func.jpg

 

I changed the name for the "Detected Cases" column to recognize in the final table that we have in here influenza cases:

df_flu.rename(columns = {'Detected_Cases': 'Detected_FluCases'}, inplace = True)

 

I've merged these two tables with each other and checked which cells had no value for 2021. All blank cells have been replaced with zeros - I needed only numerical values ​​throughout the column:

final_df = pd.merge(df_covid, df_flu, on = ['Country', 'Year', 'Week'], how = 'right').sort_values ​(['Year', 'Week', 'Country'])

final_df.fillna({'Deaths': 0, 'Confirmed': 0, 'Recovered': 0}, inplace = True)

 

For the analysis purposes and chart clarity, I have added year quarters:

quarters = pd.DataFrame(columns={"Week","Quarter"})
def quarter(x): 
    if (x <= 13):
        return 1
    elif (x <= 26):
        return 2
    elif (x <= 39):
        return 3
    else:
        return 4

 

quarters['Week'] = final_df.Week.unique()
quarters['Quarter'] = quarters['Week'].apply(quarter)

final_df = final_df.merge(quarters, on = 'Week', how = 'inner')

 

Mariusz Borycki - covid_flu-info.jpg

 

Currently we have 9 359 lines, no missing weeks, no empty cells, and no negative values. Everything looks fine, so I saved the file on my computer, then send it into my repository:

final_df.to_csv('Covid_and_Influenza.csv', index = False)

 

Analysis and Visualizations:

I've created a separate file for the analysis in Jupyter Notebook - file name "Flu_Covid_Analysis.ipynb". Of course, I started by downloading the libraries and the file itself:

import pandas as pd
import matplotlib.pyplot as plt

url = "https://raw.githubusercontent.com/mborycki/Covid_Influenza_Comparison/main/Covid_and_Influenza.csv"

df = pd.read_csv(url)

 

It is worth to remind the names of the columns and describe their meaning:

  • Country: Country name
  • Week: Week number
  • Confirmed: Number of confirmed cases of SARS-CoV-2 virus
  • Deaths: Number of fatal cases from SARS-CoV-2
  • Recovered: Number of recoveries (SARS-CoV-2)
  • Detected_FluCases: Number of people diagnosed with flu
  • Quarter: Quarter

 

There are few points I wanted to check additionally based on the table I have prepared:

  1. How many flu cases were detected in weekly basis (for Europe and Poland)?
  2. How many covid cases were detected in weekly basis (for Europe and Poland)?
  3. Top 10 countries with flu / Covid19
  4. Finally, check how many flu cases we had after Q2 2020

 

AD1. How many flu cases were detected in weekly basis (for Europe and Poland)?

To see the flu history, I created the following function:

def WeeklyFluChart(table, where):
    """
    THE CHART SHOWS WEEKLY CASES DETECTION FOR FLU WITHIN YEARS 
    
    table: dataframe with influenza cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """

    years = table.Year.unique()
    color_per_year = ['green', 'blue', 'yellow', 'orange', 'purple', 'red']

 

    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=16, rotation=45)
    plt.grid(color='grey', linestyle = '--', linewidth = 0.5)

 

    for number, year in enumerate(years):
        color = color_per_year[number]

        x = table[table.Year==year].Week.unique()
        y = table[table.Year==year].groupby(['Week', 'Year'])['Detected_FluCases'].sum().reset_index().sort_values(['Year','Week'])['Detected_FluCases']
        plt.title(f"Influenza Cases per year in {where}", fontsize=28)
        plt.xlabel('Weeks', fontsize=24)
        plt.ylabel("Detected Flu Cases", color='black', fontsize=24)
        plt.plot(x, y, color=color)
        plt.tick_params(axis='y', labelcolor='black', labelsize=16) 
        plt.legend(years, fontsize=16)

 

Checking the result:

WeeklyFluChart(df, 'Europe')

 

Mariusz Borycki - flu_cases-line_chart_Europe_yearly

 

The highest number of detected flu cases were in 2018 and 2019. In 2021, the flu detection rate was near to zero. Of course, the data doesn't include the full year of 2021, but the biggest detection increase occurs between the fifth and tenth week of each year, and in 2021 there is no increase in the mentioned week.

 

Checking the same report, but for Poland:

WeeklyFluChart(df[df.Country == 'Poland'], 'Poland')

 

Mariusz Borycki - flu_cases-line_chart_Poland_yearly

 

The graph looks similar to the previous one from Europe. However, the biggest increase was in 2016 and you can see a really small increase in flu detection after week 50 (in Europe we had more cases of influenza at the same time). What is common for Europe and Poland is the virtually no flu cases in 2021.

 

AD2. How many covid cases were detected in weekly basis (for Europe and Poland)

 

What we know about the pandemic is that there were no officially confirmed cases of this virus prior to 2020. We can confirm this with the code line below:

df[df.Confirmed > 0].Year.unique()

out: [2021, 2020]

 

I would like to see the results for Covid19 in a chart and compare them with the corresponding flu data. So, I have decided to remove the values ​​for the weeks from my table and keep only the years and quarters. I put the new data in a table called "df_q":

df_q = df[['Country','Year','Quarter','Confirmed','Deaths','Recovered','Detected_FluCases']].groupby(['Country','Year','Quarter'])\

.agg({'Confirmed':'max','Deaths':'max','Recovered':'max','Detected_FluCases':'sum'}).sort_values(['Country','Year','Quarter'])\

.reset_index().sort_values(['Country','Year','Quarter'])

 

Now we can write a function for our chart:

def CovidChart(table,where):
    """
    THE CHART SHOWS WEEKLY CASES DETECTION FOR COVID IN QUARTERLY LEVEL 
    
    table: dataframe with covid cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """


    tbl = table[table.Year>=2020].groupby(['Year','Quarter'])\
    .agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'})\
    .sort_values(['Year','Quarter']).reset_index()

    # Create a new column for Year and Quarters
    tbl['YearQuarter'] = tbl.Year.astype(str)+'-Q'+tbl.Quarter.astype(str)

 

    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=14, rotation=45)
    plt.grid(color='grey', linestyle = '--', linewidth = 0.5)

 

    x = tbl.YearQuarter.sort_values().unique()
    y = tbl.groupby(['YearQuarter'])['Confirmed'].sum().reset_index().sort_values(['YearQuarter'])['Confirmed']
    plt.title(f"Covid19 Cases per year in {where}", fontsize=28)
    plt.xlabel('Quarters', fontsize=18)
    plt.ylabel("Detected Covid19 Cases", color='black', fontsize=18)
    plt.bar(x, y, color='grey')
    plt.tick_params(axis='y', labelcolor='black', labelsize=16) 

 

    for xx,yy in zip(x,y):
        label = f'{yy:,}'
        plt.annotate(label, (xx,yy), textcoords="offset points", xytext=(0,10), ha='center', fontsize=16) 

CovidChart(df_q, 'Europe')

 

 

Mariusz Borycki - covid_cases_bar_chart-Europe

 

As seen above, we have a large increase in COVID cases in the fourth quarter of 2020. The first quarter of 2021 was not much better. However, there is a clear improvement in the second quarter of 2021:

CovidChart(df_q[df_q.Country == 'Poland'], 'Poland')

 

 

Mariusz Borycki - covid_cases_bar_chart-Poland

 

In Poland, we have a quite similar trend as it was in Europe.

 

Now we can create one chart containing the flu and covid cases. However, it is important to remember that this will only be done to show a certain trend, as there are far more coronavirus infections in comparison to the flu detected cases.

The first function divides the values ​​in the table by a thousand. I did this mainly to avoid showing the millionth values ​​on the chart. I also rounded the numbers to two decimal places, which will also improve readability. The last issue is to create a new column combining years and quarters:

# I made a function to have bigger visibility on chart
def CovidCasesDevider(table_name):
    covid_cases_list_mln = ['Confirmed','Deaths','Recovered','Detected_FluCases']

 

# number of cases divided by 1000
    for col in covid_cases_list_mln: 
        for value in range(len(table_name)):
            table_name.loc[value,(col)] = table_name.loc[value,(col)] / 1000

 

# rounded to 2 decimal values            
    for col in covid_cases_list_mln: 
        table_name[col] = table_name[col].apply(lambda x: round(x,2))

 

# Do not need weeks/year in chart - quarters is enough      
    table_name['YearQuater'] = table_name.Year.astype(str)+'-Q'+table_name.Quarter.astype(str)
    table_name = table_name.drop(["Year","Quarter"],axis=1)
    
    return table_name

 

And below is the final function for a graph with virus comparison:

def VirusComparison(table,where):
    """
    THE CHART SHOWS QYARTERLY COMPARISON OF CASES DETECTION FOR COVID AND FLU 
    
    table: dataframe with detected cases (DataFrame)
    where: country we are interested in. Required for a chart title (String/Object)
    """

 

    # Chart Creation
    fig, ax1 = plt.subplots()
    plt.rcParams["figure.figsize"] = (25,15)
    plt.xticks(fontsize=16, rotation=45)
    plt.grid(color = 'green', linestyle = '--', linewidth = 0.5)

 

    # Data Aggregation
    chart = CovidCasesDevider(table.groupby(['Year','Quarter'])\
    .agg({'Confirmed':'sum', 'Deaths':'sum', 'Recovered':'sum','Detected_FluCases':'sum'})\
    .sort_values(['Year','Quarter']).reset_index())

 

    x = chart.YearQuater.unique()
    y = chart['Confirmed']
    z = chart['Detected_FluCases']

 

    # Covid19:
    ax1.set_title(f"Covid and Flu Cases Comparison in {where} ('000)", fontsize=28)
    color = 'tab:red'
    ax1.set_xlabel('Periods', fontsize=24)
    ax1.set_ylabel("Confirmed Covid19 Cases ('000)", color=color, fontsize=24)
    ax1.plot(x, y, color=color)
    ax1.tick_params(axis='y', labelcolor=color, labelsize=16) 

 

    for xx,yy in zip(x,y):
        label = "{:.0f}".format(yy)
        plt.annotate(label, (xx,yy), textcoords="offset points", xytext=(0,10), ha='right', fontsize=20) 

 

    # Influenza:
    ax2 = ax1.twinx()  
    color = 'tab:blue'
    ax2.set_ylabel("Detected Inluenza Cases ('000)", color=color, fontsize=24)  # we already handled the x-label with ax1
    ax2.plot(x, z, color=color)
    ax2.tick_params(axis='y', labelcolor=color, labelsize=16)

 

    for xx,zz in zip(x,z):
        label = "{:.1f}".format(zz)
        plt.annotate(label,(xx,zz),textcoords="offset points",xytext=(0,10),ha='right', fontsize=20)

 

    fig.tight_layout()  
    plt.show()

VirusComparison(df_q, 'Europe')

 

 

Mariusz Borycki - covid_vs_flu_cases-line_chart_Europe

 

There is a clear connection between covid19 and the flu. When the pandemic came to Europe, the flu was almost gone. Remember that the volumes on the chart are divided by 1000.

 

Thus, the highest number of influenza cases was in Q1 2018: 184,900 cases
On the other hand, the highest number of Covid19 cases was in Q2 2021: 52,285,000 cases

 

Let's see how it looked in Poland:

chosen_country = 'Poland'


VirusComparison(df_q[df_q.Country == chosen_country], chosen_country)

 

 

Mariusz Borycki - covid_vs_flu_cases-line_chart_Poland

There is a similar trend in Poland compared to what we saw on the chart for the whole Europe.

 

AD3. Top 10 countries with flu / covid19

 

The next step was to check the top 10 countries with the highest influenza and covid19 detection.

 

The most influenza cases reported:

df_q.groupby(['Country', 'Year'])['Detected_FluCases'].max().sort_values​​(ascending = False).reset_index().drop_duplicates('Country').head(10).set_index('Country')

 

 

Mariusz Borycki - flu_cases-TOP10_Europe

 

The most covid19 cases reported:

 

df_q.groupby(['Country', 'Year'])['Confirmed'].max().sort_values​​(ascending = False).reset_index().drop_duplicates('Country').head(10).set_index('Country')

 

 

Mariusz Borycki - covid_cases-TOP10

 

AD4. Finally, I will check how many flu cases we had after the second quarter of 2020

 

df_q[(df_q.Year>=2020)&(df_q.Quarter>=2)].groupby(['Country','Year'])['Detected_FluCases'].max().sort_values(ascending=False).\

reset_index().drop_duplicates('Country').head(10).set_index('Country')

 

 

Mariusz Borycki - flu_cases-TOP10_Europe2

 

 

Conclusion:

On the basis of the data I have prepared, we can say with certainty that the SARS-CoV-2 pandemic influenced the detection of influenza. However, the number of flu cases has never (at least since 2016) been close to the range of the covid19 pandemic across the Europe.

The scale of the coronavirus is incredibly high, and we cannot doubt that we have been hit by the pandemic.


Please feel free to visit my GitHub account where you can find all of the scripts from this project with their description.

Comments:

0 comments

There is no comment yet.

Add new comment: