sklearn.linear model.LinearRegression

From GM-RKB
Jump to navigation Jump to search

A sklearn.linear model.LinearRegression is a linear least-squares regression system within sklearn.linear_model class.

1) Import Linear Regression model from scikit-learn : from sklearn.linear_model import LinearRegression
2) Create a design matrix X and response vector Y
3) Create a Lasso Regression object: model=LinearRegression([fit_intercept=True, normalize=False, copy_X=True, n_jobs=1])
4) Choose method(s):
Input: Output:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import cross_val_predict
from sklearn.datasets import load_boston
from sklearn.metrics import explained_variance_score, mean_squared_error
import numpy as np
import pylab as pl
boston = load_boston() #Loading boston datasets
x = boston.data # Creating Regression Design Matrix
y = boston.target # Creating target dataset
linreg = LinearRegression() # Create linear regression object
linreg.fit(x,y) <span style="font-weight:italic; color:gray; # Fit linear regression
yp = linreg.predict(x) # predicted values
yp_cv = cross_val_predict(linreg, x, y, cv=10) #Calculation 10-Fold CV
linear boston10fold.png
(blue dots correspond to 10-Fold CV)

#Calculaton of RMSE and Explained Variances

Evariance=explained_variance_score(y,yp)
Evariance_cv=explained_variance_score(y,yp_cv)
RMSE =np.sqrt(mean_squared_error(y,yp))
RMSECV =sqrt(mean_squared_error(y,yp_cv)_
Method: Linear Regression
RMSE on the dataset: 4.6795
RMSE on 10-fold CV: 5.8819
Explained Variance Regression Score on the dataset : 0.7406
Explained Variance Regression 10-fold CV: 0.5902


References

2017a

2017b

# Split the targets into training/testing sets 
diabetes_y_train = diabetes.target[:-20]
diabetes_y_test = diabetes.target[-20:]

# Create linear regression object regr = linear_model.LinearRegression()
# Train the model using the training sets regr.fit(diabetes_X_train, diabetes_y_train)

2017D

2017 e.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(data.data, data.target)

from sklearn.linear_model import LinearRegression

clf = LinearRegression()

clf.fit(X_train, y_train)

predicted = clf.predict(X_test)

expected = y_test

print("RMS: %s" % np.sqrt(np.mean((predicted - expected) ** 2)))

We can plot the error: expected as a function of predicted:
plt.scatter(expected, predicted)

2016