How to remove duplicates in pandas?

Published on Aug. 22, 2023, 12:18 p.m.

To remove duplicates in a pandas DataFrame, you can use the drop_duplicates() method. This method returns a new DataFrame with duplicate rows removed, based on one or more columns. Here is an example:

import pandas as pd

# Create a DataFrame with duplicate rows
df = pd.DataFrame({'col1': ['A', 'B', 'A'], 'col2': [1, 2, 1]})

# Remove duplicates based on col1 and col2 columns
df = df.drop_duplicates(['col1', 'col2'])

# Print the new DataFrame
print(df)

In this code, we create a DataFrame df with duplicate rows, and then use the drop_duplicates() method to remove duplicates based on the col1 and col2 columns. The resulting DataFrame has only the unique rows.

If you want to remove duplicates based on all columns, you can call drop_duplicates() without any arguments:

# Remove duplicates based on all columns
df = df.drop_duplicates()

# Print the new DataFrame
print(df)

In this code, we call drop_duplicates() without any arguments to remove duplicates based on all columns.

Tags: