How to count Duplicates in Pandas DataFrame
Published on Aug. 22, 2023, 12:12 p.m.
You can count duplicates in Pandas DataFrame.
df.pivot_table(columns=['DataFrame Column'], aggfunc='size')
Duplicates in Pandas DataFrame
Count duplicates under a single DataFrame column.
import pandas as pd
boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
}
df = pd.DataFrame(boxes, columns= ['Color','Shape'])
print (df)
You’ll get this DataFrame:
Color Shape
0 Green Rectangle
1 Green Rectangle
2 Green Square
3 Blue Rectangle
4 Blue Square
5 Red Square
6 Red Square
7 Red Rectangle
You may see duplicates under both the Color and Shape columns.
Then count the duplicates under each column using the method that was introduced at the beginning of this guide:
df.pivot_table(columns=['DataFrame Column'], aggfunc='size')
Python code:
import pandas as pd
boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
}
df = pd.DataFrame(boxes, columns= ['Color','Shape'])
dups_color = df.pivot_table(columns=['Color'], aggfunc='size')
print (dups_color)
Blue 2
Green 3
Red 3
Alternatively, you can get the count of duplicates for the Shape column .
import pandas as pd
boxes = {'Color': ['Green','Green','Green','Blue','Blue','Red','Red','Red'],
'Shape': ['Rectangle','Rectangle','Square','Rectangle','Square','Square','Square','Rectangle']
}
df = pd.DataFrame(boxes, columns= ['Color','Shape'])
dups_shape = df.pivot_table(columns=['Shape'], aggfunc='size')
print (dups_shape)
Then you’ll get 4 duplicates for each shape:
Rectangle 4
Square 4