How to normalize data in pandas?
Published on Aug. 22, 2023, 12:18 p.m.
To normalize data in pandas, you can use the sklearn.preprocessing
module, which includes the MinMaxScaler
class for feature scaling. This class normalizes each column of a DataFrame to have values between 0 and 1.
Here is an example of how to normalize a DataFrame in pandas:
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
# Create a DataFrame to normalize
df = pd.DataFrame({'col1': [1, 2, 3, 4], 'col2': [10, 20, 30, 40]})
# Create a MinMaxScaler object
scaler = MinMaxScaler()
# Normalize the DataFrame
df_normalized = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)
# Print the normalized DataFrame
print(df_normalized)
In this code, we create a pandas DataFrame df
with two columns. We then create a MinMaxScaler()
object and call its fit_transform()
method to normalize the DataFrame. The resulting ndarray
is then converted back to a new DataFrame with the same column names using pd.DataFrame()
. Finally, we print the normalized DataFrame.
Note that normalization is only appropriate when the original data has a known range and you want to make it comparable across different columns or datasets. If the original data does not have a known range or the range is irrelevant, then normalization may not be appropriate. In such cases, other scaling methods such as standardization may be more appropriate.
Also note that the MinMaxScaler
class only normalizes across columns, not rows. If you want to normalize across rows, you can pass the axis=1
argument to .fit_transform()
.
# Normalize the DataFrame across rows
df_normalized_rows = pd.DataFrame(scaler.fit_transform(df.T).T, columns=df.columns)
# Print the normalized DataFrame
print(df_normalized_rows)
In this code, we transpose the original DataFrame with .T
, apply the MinMaxScaler
, and then transpose back with .T
to obtain the normalized DataFrame across rows.