How to Standardize Data in Python
Published on Aug. 22, 2023, 12:16 p.m.
To standardize data in Python, you can use the StandardScaler()
function from the scikit-learn
library. Here’s an example of how to use the StandardScaler()
function to standardize a dataset:
from sklearn.preprocessing import StandardScaler
import pandas as pd
# load the dataset into a pandas DataFrame
df = pd.read_csv('example_data.csv')
# prepare the data by separating the input features (X) from the target variable (y)
X = df.drop('target_variable', axis=1)
# create a StandardScaler object and use it to scale the input features
scaler = StandardScaler()
scaled_X = scaler.fit_transform(X)
# print the first five rows of the scaled data
print(scaled_X[:5])
In this example, we first load the example data into a pandas DataFrame and separate the input features from the target variable. We then create a StandardScaler
object and use it to fit and transform the input features using the fit_transform()
method. Finally, we output the first five rows of the scaled data to the console.
Note that the fit_transform()
method updates the state of the StandardScaler
object, so if you want to apply the scaling to new data later on, you should use the transform()
method instead.
When you standardize data, you transform it to have a mean of 0 and a standard deviation of 1. Standardization is often useful when the input features have different units and scales, and you want to make sure that each feature contributes equally to the analysis.