Simple Linear Regression Python Notebook

Chapter 5: Machine Learning

Python Data Science / Page 390

Note:

  • Linear regression using Python's scikit-learn library.
  • See notebook at CSC 578D, for a detailed explanation of Linear Regression and a custom implementation.
In [25]:
%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np
from sklearn.linear_model import LinearRegression
from IPython.display import Math
import warnings
warnings.filterwarnings(action="ignore", module="scipy", message="^internal gelsd")
In [26]:
# generate the data
rng = np.random.RandomState(1) # generate a 1-D array of random numbers

# load data into X and Y
x = 10 * rng.rand(50) # generate 50 uniform random numbers [0,1)
y = 2 * x - 5 + rng.randn(50) # compute function y = f(x) = 2x - 5 + C, where C is a normal random number N(0,1)

# plot the data
plt.scatter(x, y)
plt.grid(True)

Linear Regression Model Formula:

$ y_i = \mbox{E}(Y_i) + \epsilon_i = \beta_0 + \beta_1x_i + \epsilon_i $

In [27]:
model = LinearRegression(fit_intercept=True) # fit_intercept=True calculate Y-intercept
# generate the linear regression model
model.fit(x[:, np.newaxis], y)
print("Model slope (ß1):", model.coef_[0])
print("Model intercept (ß0):", model.intercept_)

# debug
print("")
print("Model fitting parameters:")
print("X Data Matrix {}".format(np.shape(x[:, np.newaxis])))
print("Y Data Matrix {}".format(np.shape(y)))
print("")

xfit = np.linspace(0, 10, 1000) # 1000 evenly spaced numbers [0,10]
yfit = model.predict(xfit[:, np.newaxis]) # make the predictions based on the model generated previously

# generate both plots: scatter and line
plt.scatter(x, y)
plt.plot(xfit, yfit)
plt.grid(True)
Model slope (ß1): 2.0272088103606953
Model intercept (ß0): -4.998577085553204

Model fitting parameters:
X Data Matrix (50, 1)
Y Data Matrix (50,)

In [28]:
# make an individual prediction based on the model
prediction = model.predict(np.array([[12]]))[0]
print("Individual Prediction:")
print("For x=12, ŷ={}".format(prediction))

xfit = np.linspace(0, 12, 1000)
yfit = model.predict(xfit[:, np.newaxis])

# generate scatter plot
plt.scatter(np.append(x, [12]), np.append(y, prediction))
plt.plot(xfit, yfit)
plt.grid(True)
Individual Prediction:
For x=12, ŷ=19.327928638775138