How to perform data normalization using scikit-learn?
Published on Aug. 22, 2023, 12:18 p.m.
To perform data normalization using scikit-learn, you can use the MinMaxScaler
or the StandardScaler
classes.
- Min-Max Normalization: This method scales the data to a fixed range of [0, 1]. Each feature is transformed to lie between the minimum and maximum values of the feature. Scikit-learn provides the
MinMaxScaler
class for this purpose. Here’s an example:
from sklearn.preprocessing import MinMaxScaler
import numpy as np
# create sample data
data = np.array([[10, 2], [5, 3], [8, 7], [2, 9]])
# perform min-max normalization
scaler = MinMaxScaler()
normalized_data = scaler.fit_transform(data)
print(normalized_data)
In this example, we are using MinMaxScaler
to normalize the data. The fit_transform()
method fits the scaler on the data and applies normalization.
- Z-Score or Standardization: This method scales the data to have a mean of 0 and a standard deviation of 1. Each feature is transformed to have zero mean and unit variance. Scikit-learn provides the
StandardScaler
class for this purpose. Here’s an example:
from sklearn.preprocessing import StandardScaler
import numpy as np
# create sample data
data = np.array([[10, 2], [5, 3], [8, 7], [2, 9]])
# perform Z-score normalization
scaler = StandardScaler()
normalized_data = scaler.fit_transform(data)
print(normalized_data)
In this example, we are using StandardScaler
to normalize the data. The fit_transform()
method fits the scaler on the data and applies normalization.
By using these techniques, you can perform data normalization in scikit-learn and prepare it for use in machine learning models.