How to stem or lemmatize words using NLTK?

Published on Aug. 22, 2023, 12:19 p.m.

To stem or lemmatize words using NLTK in Python, you can follow these steps:

  1. Install the NLTK library if it’s not already installed in your system.
pip install nltk
  1. Import the necessary libraries and download the WordNet corpus.
import nltk
nltk.download('wordnet')
  1. Initialize the stemmer or lemmatizer object. NLTK provides several options for stemmers or lemmatizers, such as Porter stemmer or WordNet lemmatizer.
from nltk.stem import PorterStemmer, WordNetLemmatizer
stemmer = PorterStemmer()
lemmatizer = WordNetLemmatizer()
  1. Tokenize your text into words and apply the stemmer or lemmatizer to each word using a list comprehension.
from nltk.tokenize import word_tokenize
text = "Stemming and lemmatization are important techniques in natural language processing"
words = word_tokenize(text)
stemmed_words = [stemmer.stem(word) for word in words]
lemmatized_words = [lemmatizer.lemmatize(word) for word in words]

Here, stemmed_words will contain the stemmed list of words and lemmatized_words will contain the lemmatized list of words.

Alternatively, You can use the stemming or lemmatization module to stem or lemmatize text.

from stemming.porter2 import stem
print(stem('stemming and lemmatization are important techniques in natural language processing'))

from lemmatization.lemmatize import lemmatize
print(lemmatize('stemming and lemmatization are important techniques in natural language processing'))

Either way, the resulting stemmed_words, lemmatized_words or ‘stemmed_text’, ‘lemmatized_text’ will contain the original text with all of the stemming and lemmatization applied.

Tags: