How to use cross-validation in scikit-learn?

Published on Aug. 22, 2023, 12:18 p.m.

To use cross-validation in scikit-learn, you can use the cross_val_score() function to evaluate a model on multiple splits of the dataset. Here’s an example of how to use cross_val_score() to perform 5-fold cross-validation on an SVM model:

from sklearn.datasets import load_iris
from sklearn.model_selection import cross_val_score
from sklearn.svm import SVC

# load the iris dataset
iris = load_iris()

# create an SVM model
svm = SVC(kernel='linear', C=1.0)

# evaluate the model using cross-validation
scores = cross_val_score(svm, iris.data, iris.target, cv=5)

# print the mean score and standard deviation of the scores
print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

In this example, we load the iris dataset and create an SVM model (SVC()) with a linear kernel and C=1.0. We use cross_val_score() to evaluate the model using 5-fold cross-validation, and store the resulting scores in the scores variable. Finally, we print the mean score and standard deviation of the scores.

Note that the cv parameter specifies the number of folds to use for cross-validation. You can also specify a different cross-validation strategy, such as a KFold object, by passing it as the cv parameter.