How to implement linear regression in NumPy

Published on Aug. 22, 2023, 12:16 p.m.

To implement linear regression in NumPy, you can first define your input matrix X and output vector y, and then use the linear algebra capabilities of NumPy to calculate the coefficients of the linear regression model. Here is an example code snippet:

import numpy as np

# Define the input matrix X and output vector y
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([4, 5, 6, 7])

# Calculate the coefficients of the linear regression model
w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

# Print the coefficients
print(w)

In this code, we define the input matrix X as a 4x2 numpy array, with each row representing a pair of input variables. We also define the output vector y as a 1-dimensional numpy array.

We then use the dot() method of numpy arrays to calculate the dot product of X and its transpose, and then the dot product of that result with X’s transpose, and then the dot product of that result with y. This gives us the coefficients of the linear regression model, which are printed out.

Note that it’s important to add a column of 1s to the input matrix X to represent the intercept term in the linear regression model. This can be done using the np.hstack() method to horizontally stack a column of 1s onto the input matrix.

To save a NumPy linear regression model

To save a NumPy linear regression model, you can use the pickle module to serialize the trained model object into a binary file, which can be loaded later to make predictions on new data. Here is an example code snippet:

import numpy as np
import pickle

# Define the input matrix X and output vector y
X = np.array([[1, 1], [1, 2], [2, 2], [2, 3]])
y = np.array([4, 5, 6, 7])

# Train the linear regression model
w = np.linalg.inv(X.T.dot(X)).dot(X.T).dot(y)

# Save the model to disk using pickle
with open('linear_regression_model.pkl', 'wb') as f:
    pickle.dump(w, f)

# Load the model from disk using pickle
with open('linear_regression_model.pkl', 'rb') as f:
    w_loaded = pickle.load(f)

# Make predictions using the loaded model
X_new = np.array([[3, 3], [3, 4]])
y_pred = X_new.dot(w_loaded)
print(y_pred)

In this code, we first train a linear regression model on the input matrix X and output vector y, and calculate the coefficients w.

We then use the pickle.dump() function to serialize the trained model object w into a binary file named linear_regression_model.pkl.

To load the saved model from disk, we use the pickle.load() function to deserialize the binary file into a Python object w_loaded.

Finally, we make predictions on new input data X_new using the loaded model w_loaded, and print out the predictions.

Note that when serializing or deserializing a Python object using pickle, it’s important to use the modes 'wb' and 'rb', respectively, to indicate that the file is being written or read in binary mode.