How to perform data normalization using scikit-learn?

Published on Aug. 22, 2023, 12:18 p.m.

To perform data normalization using scikit-learn, you can use the MinMaxScaler or the StandardScaler classes.

  1. Min-Max Normalization: This method scales the data to a fixed range of [0, 1]. Each feature is transformed to lie between the minimum and maximum values of the feature. Scikit-learn provides the MinMaxScaler class for this purpose. Here’s an example:
from sklearn.preprocessing import MinMaxScaler
import numpy as np

# create sample data
data = np.array([[10, 2], [5, 3], [8, 7], [2, 9]])

# perform min-max normalization
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)

print(normalized_data)

In this example, we are using MinMaxScaler to normalize the data. The fit_transform() method fits the scaler on the data and applies normalization.

  1. Z-Score or Standardization: This method scales the data to have a mean of 0 and a standard deviation of 1. Each feature is transformed to have zero mean and unit variance. Scikit-learn provides the StandardScaler class for this purpose. Here’s an example:
from sklearn.preprocessing import StandardScaler
import numpy as np

# create sample data
data = np.array([[10, 2], [5, 3], [8, 7], [2, 9]])

# perform Z-score normalization
scaler = StandardScaler()
normalized_data = scaler.fit_transform(data)

print(normalized_data)

In this example, we are using StandardScaler to normalize the data. The fit_transform() method fits the scaler on the data and applies normalization.

By using these techniques, you can perform data normalization in scikit-learn and prepare it for use in machine learning models.