kokobob.com

Analyzing COVID-19 Time Series Data with Pandas in Python

Written on

Chapter 1: Introduction to Time Series Data

In the realm of data science, time series datasets are among the most frequently encountered. This tutorial aims to provide a concise introduction to utilizing Pandas for manipulating and analyzing the confirmed COVID-19 case dataset sourced from Johns Hopkins University (JHU) CSSE.

Let's dive right in!

If you are new to Pandas or Python, begin by downloading the latest version of Python and install Pandas using the following command in your console:

$ pip install pandas

Section 1.1: Preparing the Dataframe

Start by organizing your project folder and downloading the time-series CSV file from Johns Hopkins CSSE.

Next, create a new Python file and load the CSV into a Pandas DataFrame:

import pandas as pd

df = pd.read_csv('time_series_covid19_confirmed_global.csv')

print(df)

Dataframe of COVID-19 confirmed cases

Section 1.2: Cleaning the Data

Now, let's clean the data. The dataset includes cases at the Province/State level in certain areas, so we will aggregate this information to the Country/Region level using the groupby function to sum the total cases.

Before aggregation, we will remove unnecessary columns such as Province/State, Latitude, and Longitude:

df = df.drop(columns=['Province/State', 'Lat', 'Long'])

df = df.groupby('Country/Region').agg('sum')

Chapter 2: Preparing the Datetime Index

Next, we need to create a DateTime index for our DataFrame. To do this, we will transpose the DataFrame first:

df = df.T

In this transposed DataFrame, the index now represents date values. However, these are still in string format, so we will convert them into DateTime using pd.to_datetime and pd.DatetimeIndex:

df_time = pd.to_datetime(df.index)

datetime_index = pd.DatetimeIndex(df_time.values)

df = df.set_index(datetime_index)

Section 2.1: Exploring the Time-Series Data

Our time-series DataFrame is now set up with a DateTime index. You can extract data from specific dates. For instance:

  • To select confirmed COVID-19 cases for the 15th of any month:

df[df.index.day == 15]

  • To select data from April:

df[df.index.month == 4] # All years

# or

df['2020-04'] # April 2020

  • To select data from April 1, 2020, to April 5, 2020:

df['2020-04-01':'2020-04-05']

  • To find the six countries with the highest confirmed cases:

df = df.sort_values(by=df.index.values[-1], axis=1, ascending=False)

df = df.iloc[:, 0:6]

Video: Analyzing COVID Vaccine Data with Pandas in Python

This video showcases techniques for analyzing COVID-19 vaccine data using Pandas.

Chapter 3: Resampling the Time-Series Data

Let’s get to the exciting part—resampling the time-series data using the resample method. You can adjust the frequency of your time-series data with:

df.resample()

Commonly used aliases include:

  • 'nD' for n days
  • 'nM' for n months
  • 'nW' for n weeks

Here’s how to summarize and analyze the data:

df.resample(timeinterval).sum() # Sum

df.resample(timeinterval).mean() # Mean

For example, to find the mean resampled confirmed COVID-19 cases on a weekly basis:

df.resample('W').mean()

df.resample('W').mean().plot()

Video: Dynamic Mapping of COVID-19 Progression with Python

This video illustrates how to create dynamic maps to visualize the progression of COVID-19 using Python.

Chapter 4: Analyzing Percentage Growth

To analyze the weekly percentage growth of COVID-19 cases, simply apply .pct_change():

df.resample('W').mean().pct_change()

You can visualize this by skipping the initial NaN values:

df.resample('W').mean().pct_change().iloc[2:].plot(marker="v", figsize=(15, 5))

Conclusion

This article serves as a fundamental guide on manipulating and analyzing time series data, specifically focusing on the confirmed COVID-19 cases dataset. While it covers basic functionalities, there are many advanced techniques available for time series analysis. For more detailed information, refer to the official Pandas time-series documentation.

I hope this guide proves beneficial for your projects and daily tasks. Feel free to reach out with any questions or feedback.

Stay safe and healthy! 💪

Thank you for reading. 📚

Share the page:

Twitter Facebook Reddit LinkIn

-----------------------

Recent Post:

The Value of Integrity: More Than Just a Dollar Amount

Explore the profound impact of integrity through real-life stories that highlight true friendship and ethical choices.

Fungus Among Us: Exploring the Mysteries of Mycology

Delve into the fascinating world of fungi and their crucial role in ecosystems, human evolution, and health.

The Crisis of Professional Skepticism: A Call for Integrity

An exploration of the intellectual challenges within professional skepticism, highlighting the need for transparency and honesty in research.