sklearn.linear model.HuberRegressor

A sklearn.linear model.HuberRegressor is an Huber Regression System within sklearn.linear_model class.

AKA: HuberRegressor, linear_model.HuberRegressor.
Context
- Usage:

1) Import Huber Regression model from scikit-learn : from sklearn.linear_model import HuberRegressor

2) Create design matrix X and response vector Y

3) Create Huber Regression object: Hreg=HuberRegressor([epsilon=1.35, max_iter=100, alpha=0.0001, warm_start=False, fit_intercept=True, tol=1e-05])

4) Choose method(s):

Fit the Huber Regression model with to the dataset: Hreg.fit(X, Y[, check_input]))
Predict Y using the linear model with estimated coefficients: Y_pred = Hreg.predict(X)
Return coefficient of determination (R^2) of the prediction: Hreg.score(X,Y[, sample_weight=w])
Get estimator parameters: Hreg.get_params([deep])
Set estimator parameters: Hreg.set_params(**params)

Example(s):
- 10-fold sklearn Boston data evaluation [1]

Input:	Output:
#Importing modules from sklearn.linear_model import HuberRegressor from sklearn.model_selection import cross_val_predict from sklearn.datasets import load_boston from sklearn.metrics import explained_variance_score, mean_squared_error import numpy as np import pylab as pl boston = load_boston() #Loading boston datasets x = boston.data # Creating Regression Design Matrix y = boston.target # Creating target dataset Hreg= HuberRegressor(epsilon=1.0) # Create Huber regression object Hreg.fit(x,y) # predicted values #Calculaton of RMSE and Explained Variances yp_cv = cross_val_predict(Hreg, x, y, cv=10) #Calculation 10-Fold CV Evariance=explained_variance_score(y,yp) Evariance_cv=explained_variance_score(y,yp_cv) RMSE =np.sqrt(mean_squared_error(y,yp)) RMSECV =sqrt(mean_squared_error(y,yp_cv) # Printing Results print('Method: Huber Regression') print('RMSE on the dataset: %.4f' %RMSE) print('RMSE on 10-fold CV: %.4f' %RMSECV) print('Explained Variance Regression Score on the dataset: %.4f' %Evariance) print('Explained Variance Regression 10-fold CV: %.4f' %Evariance_cv) #plotting real vs predicted data pl.figure(1) pl.plot(yp, y,'ro') pl.plot(yp_cv, y,'bo', alpha=0.25, label='10-folds CV') pl.xlabel('predicted') pl.title('Huber Regression, epsilon=1.0') pl.ylabel('real') pl.grid(True) pl.show()	(blue dots correspond to 10-Fold CV) Method: Huber Regression RMSE on the dataset: 5.1709 RMSE on 10-fold CV: 6.4916 Explained Variance Regression Score on the dataset: 0.6932 Explained Variance Regression 10-fold CV: 0.5027

Counter-Example(s):
See: Regression System, Regressor, Cross-Validation Task, Ridge Regression Task, Bayesian Analysis.

References

2017

http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.HuberRegressor.html
- QUOTE: class sklearn.linear_model.HuberRegressor(epsilon=1.35, max_iter=100, alpha=0.0001, warm_start=False, fit_intercept=True, tol=1e-05)

Linear regression model that is robust to outliers.

The Huber Regressor optimizes the squared loss for the samples where |(y - X'w) / sigma| < epsilon and the absolute loss for the samples where |(y - X'w) / sigma| > epsilon, where w and sigma are parameters to be optimized. The parameter sigma makes sure that if y is scaled up or down by a certain factor, one does not need to rescale epsilon to achieve the same robustness. Note that this does not take into account the fact that the different features of X may be of different scales.

This makes sure that the loss function is not heavily influenced by the outliers while not completely ignoring their effect.

sklearn.linear model.HuberRegressor

References

2017

Navigation menu

Search