How to handle missing data in pandas

Published on Aug. 22, 2023, 12:17 p.m.

To handle missing data in a pandas DataFrame, you can use the fillna() method to either replace missing values with a specified value or interpolate missing values based on the surrounding data.

Here’s an example of how to use fillna() to replace missing values in a DataFrame:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4],
                   'B': [None, 6, 7, 8],
                   'C': [9, 10, 11, None]})

# Replace missing values with 0
df.fillna(0, inplace=True)

print(df)

In this example, fillna() replaces all missing values with 0. The inplace=True parameter is used to modify the DataFrame in place.

Here’s an example of how to use fillna() to interpolate missing values:

import pandas as pd

# Create a sample DataFrame
df = pd.DataFrame({'A': [1, 2, None, 4],
                   'B': [None, 6, 7, 8],
                   'C': [9, 10, 11, None]})

# Interpolate missing values
df.interpolate(inplace=True)

print(df)

In this example, fillna() uses linear interpolation to fill in the missing values. The inplace=True parameter is used to modify the DataFrame in place.

Note that there are many other strategies for handling missing data, such as dropping rows or columns that contain missing values, or using machine learning models to impute missing values based on other features in the dataset. The best strategy may depend on the specific application and dataset.

Tags: