How to group and aggregate data using pandas
Published on Aug. 22, 2023, 12:17 p.m.
To group and aggregate data using pandas, you can use the groupby()
method followed by an aggregate function. Here are the steps:
- Load the data into a pandas DataFrame using
read_csv()
orread_excel()
functions. - Group the data by one or more columns using the
groupby()
method. - Apply an aggregate function to the grouped data using the
agg()
method. You can use built-in functions likesum()
,mean()
,max()
,min()
, or provide your own custom function. - Reset the index of the resulting DataFrame using the
reset_index()
method.
Here’s an example of how to group and aggregate a sample DataFrame using pandas:
import pandas as pd
# Load data into a DataFrame
df = pd.read_csv('data.csv')
# Group data by a column and calculate the sum of another column for each group
grouped_df = df.groupby('group_column').agg({'aggregate_column': 'sum'})
# Reset the index
grouped_df = grouped_df.reset_index()
# Print the resulting DataFrame
print(grouped_df)
In this example, we group the data by the ‘group_column’ column and calculate the sum of the ‘aggregate_column’ for each group. We then reset the index of the resulting DataFrame.
You can also group by multiple columns by providing a list of column names to the groupby()
method:
grouped_df = df.groupby(['group_column1', 'group_column2']).agg({'aggregate_column': 'sum'})
This will group the data by both ‘group_column1’ and ‘group_column2’ columns.
There are many other aggregation functions you can use with agg()
method like count()
, first()
, last()
, median()
, nunique()
, etc. You can refer to the pandas documentation for more information on all the available functions.