How to Perform One-Hot Encoding in Python

Published on Aug. 22, 2023, 12:16 p.m.

To perform one-hot encoding in Python, you can use the pandas library, which provides a function called get_dummies() that converts categorical variables into a set of binary columns. Here is an example:

import pandas as pd

# Create a dataframe with a categorical variable named 'color'
data = {'color': ['red', 'green', 'blue', 'blue', 'green', 'red']}
df = pd.DataFrame(data)

# Perform one-hot encoding using the get_dummies() function
one_hot = pd.get_dummies(df['color'])

# Add the one-hot encoded columns to the original dataframe
df = df.join(one_hot)

# Print the resulting dataframe
print(df)

In this example, we first create a dataframe with a categorical variable named ‘color’. We then use the get_dummies() function to perform one-hot encoding on the ‘color’ variable, which generates binary columns for each unique value in the ‘color’ variable. Finally, we join the resulting one-hot encoded columns to the original dataframe and print the resulting dataframe.

Note that there are other methods for performing one-hot encoding, such as using the sklearn library’s OneHotEncoder() or LabelBinarizer() classes, as well as using the numpy library’s eye() function.

To perform one-hot encoding using scikit-learn in Python

To perform one-hot encoding using scikit-learn in Python, you can use the OneHotEncoder class from the sklearn.preprocessing module. Here is an example of how to use it:

from sklearn.preprocessing import OneHotEncoder
import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'category': ['A', 'B', 'C', 'A', 'B', 'C']})

# create an instance of the OneHotEncoder
encoder = OneHotEncoder()

# fit the encoder to the data
encoder.fit(df)

# transform the data
one_hot_encoded = encoder.transform(df).toarray()

# view the one-hot encoded data
print(one_hot_encoded)

In this example, we created a sample dataframe containing categorical data, and then created an instance of the OneHotEncoder. We then fit the encoder to the data and transformed it using the transform method. The resulting one-hot encoded data is a numpy array, which we can view using the print statement.

Note that the OneHotEncoder works with numerical categorical values, i.e., it converts categorical values to numerical representation and then performs one-hot encoding. If you have non-numerical categorical values, you need to use other techniques such as label encoding or ordinal encoding to transform them into numerical representations before performing one-hot encoding.