How to convert categorical string data into numeric in pytho

Published on Aug. 22, 2023, 12:15 p.m.

There are several ways to convert categorical string data into numeric data in Python, depending on the specific format of your data and your analysis needs. Here are some common approaches:

  1. Label Encoding: This method assigns a unique integer to each category in the data. You can use the LabelEncoder() method from the sklearn library to perform this conversion. For example:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
data['category'] = le.fit_transform(data['category'])
  1. One-Hot Encoding: This method creates new columns for each category in the data and assigns a binary value (0 or 1) to each category. You can use the get_dummies() method from pandas library to perform this conversion. For example:
import pandas as pd

data = pd.get_dummies(data, columns=['category'])
  1. Ordinal Encoding: This method assigns an integer to each category based on the order of the categories. For example, small, medium, and large could be encoded as 1, 2, and 3. You can use the replace() method of pandas library to perform this conversion. For example:
data['size'] = data['size'].replace({'small': 1, 'medium': 2, 'large': 3})

These are just a few examples of how to convert categorical string data into numeric data in Python. The best method for your data will depend on the specific details of your data and your analysis needs.

Tags: