How to save trained scikit-learn models to disk?
Published on Aug. 22, 2023, 12:18 p.m.
To save trained scikit-learn models to disk, you can use the pickle
module, which allows you to serialize and deserialize Python objects. Here’s an example of how to save a trained SVC()
estimator to disk using pickle:
import pickle
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
# load the iris dataset
iris = load_iris()
# split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target,
random_state=0)
# train an SVM model on the training set
svm = SVC(kernel='linear', C=1.0)
svm.fit(X_train, y_train)
# save the trained model to disk
with open('svm_model.pkl', 'wb') as f:
pickle.dump(svm, f)
In this example, we load the iris dataset and split it into training and testing sets. We train an SVM model (SVC()
) on the training set, and then use pickle.dump()
to save the trained model to a file named svm_model.pkl
.
To load the saved model from disk, you can use the pickle.load()
function:
import pickle
# load the saved model from disk
with open('svm_model.pkl', 'rb') as f:
svm = pickle.load(f)
# use the loaded model to make predictions on new data
new_data = [[5.1, 3.5, 1.4, 0.2], [6.2, 3.4, 5.4, 2.3]]
prediction = svm.predict(new_data)
print(prediction)
In this example, we use pickle.load()
to load the SVM model from the file svm_model.pkl
. We then use the loaded model to make predictions on some new data and print the predictions.
Keep in mind that pickle
is not the only method available for saving scikit-learn models. You can also use joblib for larger datasets, especially when you have big numpy arrays.