How to count Duplicates in Pandas DataFrame

Published on Aug. 22, 2023, 12:12 p.m.

You can count duplicates in Pandas DataFrame.

df.pivot_table(columns=['DataFrame Column'], aggfunc='size')

Duplicates in Pandas DataFrame

Count duplicates under a single DataFrame column.

import pandas as pd

boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
         'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
        }

df = pd.DataFrame(boxes, columns= ['Color','Shape'])
print (df)

You’ll get this DataFrame:

   Color      Shape
0  Green  Rectangle
1  Green  Rectangle
2  Green     Square
3   Blue  Rectangle
4   Blue     Square
5    Red     Square
6    Red     Square
7    Red  Rectangle

You may see duplicates under both the Color and Shape columns.

Then count the duplicates under each column using the method that was introduced at the beginning of this guide:

df.pivot_table(columns=['DataFrame Column'], aggfunc='size')

Python code:

import pandas as pd

boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
         'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
        }

df = pd.DataFrame(boxes, columns= ['Color','Shape'])

dups_color = df.pivot_table(columns=['Color'], aggfunc='size')
print (dups_color)

Blue     2
Green    3
Red      3

Alternatively, you can get the count of duplicates for the Shape column .

import pandas as pd

boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
         'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
        }

df = pd.DataFrame(boxes, columns= ['Color','Shape'])

dups_shape = df.pivot_table(columns=['Shape'], aggfunc='size')
print (dups_shape)

Then you’ll get 4 duplicates for each shape:

Rectangle    4
Square       4