How to manipulate and transform data using pandas

Published on Aug. 22, 2023, 12:17 p.m.

Manipulating and transforming data using pandas can be done using a variety of functions and methods. Here are some examples:

  1. Filtering data: You can filter rows of a DataFrame by specifying a condition using the loc[] function. For example, to filter all rows where the ‘age’ column is greater than 40:
filtered = df.loc[df['age'] > 40]
  1. Adding columns: You can add a new column to a DataFrame by providing a list of values, or by using a function that calculates values for each row. For example, to add a new column called ‘income’ that doubles the ‘salary’ column:
df['income'] = df['salary'] * 2
  1. Grouping data: You can group data by one or more columns using the groupby() method, and then apply functions like sum(), mean(), or any other custom function. For example, to group by the ‘department’ column and calculate the mean of the ‘salary’ column:
grouped = df.groupby('department').agg({'salary': 'mean'})
  1. Merging data: You can merge two DataFrames based on a common column using the merge() function. For example, to merge two DataFrames, df1 and df2, on the ‘id’ column:
merged = pd.merge(df1, df2, on='id')
  1. Pivot tables: You can create a pivot table to summarize data across rows and columns. For example, to create a pivot table that shows the average salary of each department and gender:
pivot = df.pivot_table(index='department', columns='gender', values='salary', aggfunc='mean')

These are just a few examples of the many operations you can perform on pandas DataFrames. Refer to the pandas documentation for more information on all the available functions and methods.

Tags: