How to manipulate and transform data using pandas
Published on Aug. 22, 2023, 12:17 p.m.
Manipulating and transforming data using pandas can be done using a variety of functions and methods. Here are some examples:
- Filtering data: You can filter rows of a DataFrame by specifying a condition using the
loc[]
function. For example, to filter all rows where the ‘age’ column is greater than 40:
filtered = df.loc[df['age'] > 40]
- Adding columns: You can add a new column to a DataFrame by providing a list of values, or by using a function that calculates values for each row. For example, to add a new column called ‘income’ that doubles the ‘salary’ column:
df['income'] = df['salary'] * 2
- Grouping data: You can group data by one or more columns using the
groupby()
method, and then apply functions likesum()
,mean()
, or any other custom function. For example, to group by the ‘department’ column and calculate the mean of the ‘salary’ column:
grouped = df.groupby('department').agg({'salary': 'mean'})
- Merging data: You can merge two DataFrames based on a common column using the
merge()
function. For example, to merge two DataFrames,df1
anddf2
, on the ‘id’ column:
merged = pd.merge(df1, df2, on='id')
- Pivot tables: You can create a pivot table to summarize data across rows and columns. For example, to create a pivot table that shows the average salary of each department and gender:
pivot = df.pivot_table(index='department', columns='gender', values='salary', aggfunc='mean')
These are just a few examples of the many operations you can perform on pandas DataFrames. Refer to the pandas documentation for more information on all the available functions and methods.