How to select specific columns in pandas?
Published on Aug. 22, 2023, 12:18 p.m.
To select specific columns in a pandas DataFrame, you can use either bracket indexing []
or the .loc[]
and .iloc[]
operators. Here are some examples:
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6], 'col3': [7, 8, 9]})
# Select the col1 and col3 columns using bracket indexing
df_new = df[['col1', 'col3']]
# Select the col1 and col3 columns using .loc[]
df_new = df.loc[:, ['col1', 'col3']]
# Select the first two columns using .iloc[]
df_new = df.iloc[:, 0:2]
# Print the new DataFrame
print(df_new)
In this code, we create a pandas DataFrame df
with three columns. To select specific columns, we use bracket indexing or the .loc[]
and .iloc[]
operators. The df[['col1', 'col3']]
statement selects the col1
and col3
columns using bracket indexing, while df.loc[:, ['col1', 'col3']]
and df.iloc[:, 0:2]
select the same columns using .loc[]
and .iloc[]
, respectively. Note that .loc[]
and .iloc[]
use slightly different syntax, where .loc[]
takes row and column labels, while .iloc[]
takes integer indices.
All three methods create a new DataFrame with only the selected columns. If you want to modify the original DataFrame, you can assign the result back to df
:
# Select the col1 and col3 columns and modify df
df = df[['col1', 'col3']]
# Print the updated DataFrame
print(df)
In this code, we assign the result of df[['col1', 'col3']]
back to df
, which modifies the original DataFrame to contain only the col1
and col3
columns.