How to use the Train/Test Split method in scikit-learn?

Published on Aug. 22, 2023, 12:18 p.m.

To use the train/test split method in scikit-learn, you can follow these steps:

  1. First, import the necessary module and load your dataset into scikit-learn.

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris

iris = load_iris() # load the iris dataset
X = iris.data # feature matrix
y = iris.target # target vector


2. Next, split your data into training and testing sets using the `train_test_split()` function.

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

In this example, we are splitting the data into training and testing sets, with 20% of the data being used for testing, and a random seed of 42 to ensure repeatability.

3. Now you can train your machine learning model on the training set, for example a DecisionTreeClassifier:

from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)


4. Finally, evaluate the performance of your model on the testing set:

accuracy = classifier.score(X_test, y_test)
print(“Accuracy:”, accuracy)



By following these steps, you can use the train/test split method in scikit-learn to evaluate the performance of your machine learning models.