How to visualize Word2Vec embeddings in Gensim?

Published on Aug. 22, 2023, 12:18 p.m.

To visualize Word2Vec embeddings in Gensim, you can use the t-SNE algorithm to reduce the dimensionality of the embeddings, and then plot them using a scatter plot. Here is an example code snippet:

import gensim
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt

# Load trained Word2Vec model
model = gensim.models.Word2Vec.load('path/to/model')

# Get vectors for a sample of words
sample_words = ['cat', 'dog', 'bird', 'horse', 'fish', 'snake']
vectors = [model.wv[word] for word in sample_words]

# Use t-SNE to reduce dimensionality to 2D
tsne = TSNE(n_components=2)
vectors_2d = tsne.fit_transform(vectors)

# Plot the words as points on a scatter plot
plt.scatter(vectors_2d[:, 0], vectors_2d[:, 1])
for i, word in enumerate(sample_words):
    plt.annotate(word, xy=(vectors_2d[i, 0], vectors_2d[i, 1]))
plt.show()

This code will plot a scatter plot of the selected words, where words that have similar contexts in the original dataset should be grouped together in the plot. You can adjust the size of the plot and other visualization parameters as desired.