How to convert categorical string data into numeric in pytho
Published on Aug. 22, 2023, 12:15 p.m.
There are several ways to convert categorical string data into numeric data in Python, depending on the specific format of your data and your analysis needs. Here are some common approaches:
- Label Encoding: This method assigns a unique integer to each category in the data. You can use the
LabelEncoder()
method from the sklearn library to perform this conversion. For example:
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
data['category'] = le.fit_transform(data['category'])
- One-Hot Encoding: This method creates new columns for each category in the data and assigns a binary value (0 or 1) to each category. You can use the
get_dummies()
method from pandas library to perform this conversion. For example:
import pandas as pd
data = pd.get_dummies(data, columns=['category'])
- Ordinal Encoding: This method assigns an integer to each category based on the order of the categories. For example, small, medium, and large could be encoded as 1, 2, and 3. You can use the
replace()
method of pandas library to perform this conversion. For example:
data['size'] = data['size'].replace({'small': 1, 'medium': 2, 'large': 3})
These are just a few examples of how to convert categorical string data into numeric data in Python. The best method for your data will depend on the specific details of your data and your analysis needs.