How to remove stop words using NLTK?

Published on Aug. 22, 2023, 12:19 p.m.

To remove stop words using NLTK in Python, you can follow these steps:

  1. Install the NLTK library if it’s not already installed in your system.
pip install nltk
  1. Import the necessary libraries and download the stopwords corpus.
import nltk
nltk.download('stopwords')
  1. Load the stopwords corpus and create a set of stop words.
stop_words = set(nltk.corpus.stopwords.words('english'))
  1. Tokenize your text into words and filter out the stop words using a list comprehension.
from nltk.tokenize import word_tokenize
words = word_tokenize("This is an example sentence to remove stopwords from")
filtered_words = [word for word in words if word.lower() not in stop_words]

Here, filter_words will contain the filtered list of words without any stop words.

Alternatively, You can use the stop_words module to remove stop words from text.

from stop_words import get_stop_words
stop_words = get_stop_words('en')
filtered_text = [word for word in text.split() if word.casefold() not in stop_words]

Either way, the resulting filtered_words or filtered_text will contain the original text with all of the stop words removed.