How to merge/join two DataFrames in pandas?

Published on Aug. 22, 2023, 12:17 p.m.

You can merge or join two DataFrames in pandas using the merge() function or the join() method. Here’s an example of using merge():

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'key': ['A', 'B', 'C', 'D'], 'value': [1, 2, 3, 4]})
df2 = pd.DataFrame({'key': ['B', 'D', 'E', 'F'], 'value': [5, 6, 7, 8]})

# Merge the two DataFrames on the 'key' column
merged_df = pd.merge(df1, df2, on='key')

print(merged_df)

This will output:

  key  value_x  value_y
0   B        2        5
1   D        4        6

In the example above, we merged the two DataFrames based on the ‘key’ column. The resulting DataFrame includes only the rows where the ‘key’ column matches in both DataFrames.

You can also use different types of joins (inner, left, right, outer) and merge on multiple columns by passing a list of column names to the on parameter.

Here is an example of using the join() method:

import pandas as pd

# Create two DataFrames
df1 = pd.DataFrame({'value': [1, 2, 3, 4]}, index=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame({'value': [5, 6, 7, 8]}, index=['B', 'D', 'E', 'F'])

# Join the two DataFrames using the index of df1
joined_df = df1.join(df2, lsuffix='_df1', rsuffix='_df2')

print(joined_df)

This will output:

   value_df1  value_df2
A          1        NaN
B          2        5.0
C          3        NaN
D          4        6.0

In the example above, we used the join() method to join the two DataFrames based on their indices. The resulting DataFrame includes all rows from df1 and only the rows from df2 that have matching indices in df1.

I hope this helps! Let me know if you have any other questions

Tags: