How to use the Train/Test Split method in scikit-learn?
Published on Aug. 22, 2023, 12:18 p.m.
To use the train/test split method in scikit-learn, you can follow these steps:
- First, import the necessary module and load your dataset into scikit-learn.
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
iris = load_iris() # load the iris dataset
X = iris.data # feature matrix
y = iris.target # target vector
2. Next, split your data into training and testing sets using the `train_test_split()` function.
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
In this example, we are splitting the data into training and testing sets, with 20% of the data being used for testing, and a random seed of 42 to ensure repeatability.
3. Now you can train your machine learning model on the training set, for example a DecisionTreeClassifier:
from sklearn.tree import DecisionTreeClassifier
classifier = DecisionTreeClassifier()
classifier.fit(X_train, y_train)
4. Finally, evaluate the performance of your model on the testing set:
accuracy = classifier.score(X_test, y_test)
print(“Accuracy:”, accuracy)
By following these steps, you can use the train/test split method in scikit-learn to evaluate the performance of your machine learning models.