How to merge/join data frames in pandas?

Published on Aug. 22, 2023, 12:18 p.m.

To merge/join data frames in pandas, you can use the merge() function or the join() function. Here’s a brief overview of how each function works:

merge():
The merge() function in pandas allows you to combine two data frames based on one or more columns that they have in common. To merge two data frames, specify the data frames to merge, the column(s) to merge on, and the type of join to perform (inner, outer, left, or right).

Here’s an example of merging two data frames on a common column called “key”:

import pandas as pd

df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})

merged = pd.merge(df1, df2, on='key', how='inner')
print(merged)

This code will merge the two data frames on the “key” column using an inner join (which means that only rows with matching keys in both data frames will be included in the merged result). The output will be a new data frame with columns for “key”, “value_x”, and “value_y”, where “value_x” and “value_y” correspond to the values in the “value” column of the original data frames.

join():
The join() function in pandas works similarly to merge(), but it is used to combine data frames on their indexes instead of on specific columns. To join two data frames, specify the data frames to join and the type of join to perform (inner, outer, left, or right).

Here’s an example of joining two data frames on their indexes:

import pandas as pd

df1 = pd.DataFrame({'value1': [1, 2, 3, 4]}, index=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame({'value2': [5, 6, 7, 8]}, index=['B', 'D', 'E', 'F'])

joined = df1.join(df2, how='inner')
print(joined)

This code will join the two data frames on their indexes using an inner join (which means that only rows with matching indexes in both data frames will be included in the joined result). The output will be a new

Tags: