How to train a random forest model using scikit-learn in Python

Published on Aug. 22, 2023, 12:16 p.m.

To train a random forest model using scikit-learn in Python, you can use the RandomForestClassifier or RandomForestRegressor class from the scikit-learn library, depending on whether you are working with a classification or regression problem. Here’s an example code snippet for training a random forest classifier:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from joblib import dump, load

# Load the data
X = np.array([[1, 2], [3, 4], [5, 6]])
y = np.array([0, 0, 1])

# Create the random forest classifier model
model = RandomForestClassifier(n_estimators=100)

# Fit the model on the data
model.fit(X, y)

# Save the model
dump(model, 'random_forest.joblib')

# Load the model
loaded_model = load('random_forest.joblib')

# Predict the output for a new input
test_input = np.array([[7, 8]])
output = loaded_model.predict(test_input)
print(output)

In this code, we first load some sample data into X and y. The X variable should contain the training feature matrix (one row per sample, one column per feature), and y should contain the corresponding class labels (for classification) or predicted values (for regression) for each sample.

We then create an instance of the RandomForestClassifier or RandomForestRegressor class, depending on the problem we are working on, and fit the model on the data using the fit() method.

Next, we save the trained model to disk using the dump() method from the joblib module.

Finally, we can load the saved model using the load() method from the joblib module, and use the trained model to make predictions for new inputs using the predict() method.

This is just a simple example, but scikit-learn provides many other options for random forest training and evaluation.