How to visualize text data using NLTK?

Published on Aug. 22, 2023, 12:19 p.m.

To visualize text data using NLTK in Python, you can follow these steps:

  1. Install the necessary libraries and download the relevant corpora and datasets.
import matplotlib.pyplot as plt
from nltk import FreqDist
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
nltk.download('punkt')
nltk.download('stopwords')
  1. Load your text data and preprocess it by removing stopwords and tokenizing the words.

stop_words = set(stopwords.words('english'))

Load your text data (in this example, movie reviews)

movie_reviews = nltk.Text(nltk.corpus.movie_reviews.words())

Remove stopwords and tokenize the words

filtered_words = [word.lower() for word in movie_reviews if word.lower() not in stop_words and word.isalpha()]


3. Create a frequency distribution of the words using the `FreqDist()` function.
```python
fdist = FreqDist(filtered_words)
  1. Visualize the frequency distribution using a plot in matplotlib.
plt.figure(figsize=(16, 5))
fdist.plot(50)
plt.xlabel('Word')
plt.ylabel('Frequency')
plt.title('Most common words in the movie reviews corpus')
plt.show()

This will create a plot showing the top 50 most common words in the movie reviews corpus. You can customize the number of words displayed and the style of the plot by modifying the parameters in the plot() and figure() functions.

Alternatively, you can also use NLTK’s dispersion_plot() function to display a plot showing how frequently a given set of words appear in the text over time, as shown below:

movie_reviews.dispersion_plot(["movie", "film", "actor", "director", "plot"])

This will display a plot showing the frequency of the words “movie”, “film”, “actor”, “director”, and “plot” over the course of the movie reviews corpus. You can customize the words displayed by modifying the list of words passed to the dispersion_plot() function.

Overall, visualizing text data can give you valuable insights into the most common words and patterns in your text, helping you to better understand and analyze it.