How to select specific columns in pandas?

Published on Aug. 22, 2023, 12:18 p.m.

To select specific columns in a pandas DataFrame, you can use either bracket indexing [] or the .loc[] and .iloc[] operators. Here are some examples:

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})

# Select the col1 and col3 columns using bracket indexing
df_new = df[['col1', 'col3']]

# Select the col1 and col3 columns using .loc[]
df_new = df.loc[:, ['col1', 'col3']]

# Select the first two columns using .iloc[]
df_new = df.iloc[:, 0:2]

# Print the new DataFrame
print(df_new)

In this code, we create a pandas DataFrame df with three columns. To select specific columns, we use bracket indexing or the .loc[] and .iloc[] operators. The df[['col1', 'col3']] statement selects the col1 and col3 columns using bracket indexing, while df.loc[:, ['col1', 'col3']] and df.iloc[:, 0:2] select the same columns using .loc[] and .iloc[], respectively. Note that .loc[] and .iloc[] use slightly different syntax, where .loc[] takes row and column labels, while .iloc[] takes integer indices.

All three methods create a new DataFrame with only the selected columns. If you want to modify the original DataFrame, you can assign the result back to df:

# Select the col1 and col3 columns and modify df
df = df[['col1', 'col3']]

# Print the updated DataFrame
print(df)

In this code, we assign the result of df[['col1', 'col3']] back to df, which modifies the original DataFrame to contain only the col1 and col3 columns.

Tags: