Linear Models
Regression
Linear Regression
Model:
Cost Function: Mean Squared Error
Training:
-
The Normal Equation:
-
Gradient Descent:
Refer to Gradient Descent
-
The Normal Equation v.s. Gradient Descent
Normal Equation | Gradient Descent | |
---|---|---|
Pros |
|
|
Cons |
|
|
Code
Linear Regression with Normal Equation:
from sklearn.linear_model import LlinearRegression
lin_reg = LlinearRegression()
lin_reg.fit(X_train, Y_train)
lin_reg.predict(X_new)
Linear Regression with Gradient Descent
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(n_iter=n_epochs, penalty=None, eta0=learning_rate)
sgd_reg.fit(X_train, Y_train)
Ridge Regression
Add L2 regularization to Linear Regression's Cost function.
Model:
the same as linear regression
Cost Function:
Training:
-
The Normal Equation (Cholesky):
where $\mathbf A$ is an identify matrix -
Gradient Descent:
Refer to Gradient Descent
Code
Ridge Regression with Cholesky Equation:
from sklearn.linear_model import Ridge
ridge_reg = Ridge(alpha=1, solver='cholesky')
ridge_reg.fit(X_train, Y_train)
ridge_reg.predict(X_new)
Ridge Regression with Gradient Descent:
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(penalty="l2")
sgd_reg.fit(X_train, Y_train)
Lasso Regression
Add L1 regularization to Linear Regression's Cost function.
Model:
the same as linear regression
Cost Function:
Training:
The Lasso cost function is not differentiable at $\theta_i = 0$, but Gradient Descent still works fine if subgradient vector is used when $\theta_i = 0$
code
from sklearn.linear_model import Lasso
lasso_reg = Lasso()
lasso_reg.fit(X_train, Y_train)
lasso_reg.predict(X_new)
from sklearn.linear_model import SGDRegressor
sgd_reg = SGDRegressor(penalty='l1')
sgd_reg.fit(X_train, Y_train)
sgd_reg.predict(X_new)
Elastic Net
Add a mix of L1 and L2 regularization into Linear Regression's cost function
Model:
the same as linear regression
Cost function
Training
Code
from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X_train_, Y_train)
elastic_net.predict(X_new)
Comparison Between L1 regularization and L2 regularization
L1 Regularization | L2 Regularization | |
---|---|---|
solution uniqueness | No | Yes |
sparsity | Yes | No |
feature selection | Yes | No |
computational efficiency | Low (No analytical solution) | High |
How to choose between regression Models
- It is always preferable to have some regularization; thus, avoid plain Linear Regression models.
- Ridge Regression is a good default.
- If you suspect only a few features are actually useful, use either Lasso Regression or Elastic Net.
- Generally speaking, Elastic Net is more ideal than Lasso since Lasso may behave erratically (1) when the number of features ($n$) is greater the number of training instances ($m$) (2) when several features are strongly correlated.
Classification
Logistic Regression (Binary Classes)
Model
Estimate Probabilities:
Prediction:
Cost function
Training
The function above is convex so gradient descent is guaranteed to find the global minimum.
Code
from sklearn.linear_model import LogisticRegression
log_reg = LogisticRegression()
log_reg.fit(X_train, Y_train)
log_reg.predict(X_new)
Logistic Regression (Multiple Classes)
Models
Compute score for each class:
Note: Each class has its own set of $\theta_k$
Normalize score with Softmax function:
Prediction:
Cost Function
Training
Code
from sklearn.linear_model import LogisticRegression
multi_log_reg = LogisticRegression(multi_class='multinomial', solver='sag', C=10)
multi_log_reg.fit(X_train, Y_train)
multi_log_reg.predict(X_train)