How to save trained scikit-learn models to disk?

Published on Aug. 22, 2023, 12:18 p.m.

To save trained scikit-learn models to disk, you can use the pickle module, which allows you to serialize and deserialize Python objects. Here’s an example of how to save a trained SVC() estimator to disk using pickle:

import pickle
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC

# load the iris dataset
iris = load_iris()

# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, 
                                                    random_state=0)

# train an SVM model on the training set
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)

# save the trained model to disk
with open('svm_model.pkl', 'wb') as f:
    pickle.dump(svm, f)

In this example, we load the iris dataset and split it into training and testing sets. We train an SVM model (SVC()) on the training set, and then use pickle.dump() to save the trained model to a file named svm_model.pkl.

To load the saved model from disk, you can use the pickle.load() function:

import pickle

# load the saved model from disk
with open('svm_model.pkl', 'rb') as f:
    svm = pickle.load(f)

# use the loaded model to make predictions on new data
new_data = [[5.1, 3.5, 1.4, 0.2], [6.2, 3.4, 5.4, 2.3]]
prediction = svm.predict(new_data)
print(prediction)

In this example, we use pickle.load() to load the SVM model from the file svm_model.pkl. We then use the loaded model to make predictions on some new data and print the predictions.

Keep in mind that pickle is not the only method available for saving scikit-learn models. You can also use joblib for larger datasets, especially when you have big numpy arrays.

Tags: