How to remove stop words using NLTK?
Published on Aug. 22, 2023, 12:19 p.m.
To remove stop words using NLTK in Python, you can follow these steps:
- Install the NLTK library if it’s not already installed in your system.
pip install nltk
- Import the necessary libraries and download the stopwords corpus.
import nltk
nltk.download('stopwords')
- Load the stopwords corpus and create a set of stop words.
stop_words = set(nltk.corpus.stopwords.words('english'))
- Tokenize your text into words and filter out the stop words using a list comprehension.
from nltk.tokenize import word_tokenize
words = word_tokenize("This is an example sentence to remove stopwords from")
filtered_words = [word for word in words if word.lower() not in stop_words]
Here, filter_words
will contain the filtered list of words without any stop words.
Alternatively, You can use the stop_words
module to remove stop words from text.
from stop_words import get_stop_words
stop_words = get_stop_words('en')
filtered_text = [word for word in text.split() if word.casefold() not in stop_words]
Either way, the resulting filtered_words
or filtered_text
will contain the original text with all of the stop words removed.