Support Vector Machines
Classification
Model
High Level Idea of SVMs
Functional Margin
Definition:
Some Observations:
- If $y^{(i)} = 1$, the larger $w^Tx + b$ is, the larger the functional margin will be.
- The value of $\hat \gamma$ is affected by the scale of $w$ and $b$; therefore, it is reasonable to introduce normalization.
Functional Margin for a dataset with $m$ instances:
Geometric Margin
Geometric Margin
(Original Graph: http://cs229.stanford.edu/notes/cs229-notes3.pdf)
Define the geometric margin of ($w$, $b$) with respect to a training example ($x^{(i)}$, $y^{(i)}$):
Geometric Margin for a dataset with $m$ instances:
Training
To train SVM, it is equivalent to find a hyperplane that can achieves the maximum geometric margin
Since "$\Vert w \Vert = 1$" is a non-convex constraint, transform the problem into the following:
Introduce the scaling constraint that the functional margin of $w, b$ with respect to the training set must be 1. That is, $\hat \gamma = 1$. The problem has become a convex quadratic objective with linear constraints and its solution gives use the optimal margin classifier.
The dual problem that used for Kernel tricks
For soft margin classification:
Linear models
- Hard Margin: Strictly enforce that all instances must be off the margin and on the right sides.
- only works for linear separable datasets
- sensitive to outliers
- Soft Margin: Relax the restriction of hart margin model.
- In Scikit-Learn's SVM classes, hyperparameter C is used to control the flexibility. A smaller C leads to a wider margin but more margin violations
from sklearn.pipeline import pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC
svm_clf = Pipeline([
('scaler', StandardScaler()),
('linear_svc', LinearSVC(C=1, loss='hinge')),
])
svm_clf.fit(X_train, Y_train)
Non-linear models
In the dual form optimization problem above, ${\mathbf x^{(i)}}^T \mathbf x^{(j)}$ can be replaced with $K(\mathbf a, \mathbf b)$
- Polynomial Kernel
from sklearn.pipeline import pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
('scaler', StandardScaler()),
('poly_svc', SVC(kernel='poly', degree=3, coef0=3, C=5)),
])
poly_kernel_svm_clf.fit(X_train, Y_train)
- Gaussian RBF Kernel
from sklearn.pipeline import pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
rbf_kernel_svm_clf = Pipeline([
('scaler', StandardScaler()),
('rbf_svc', SVC(kernel='rbf', gamma=5, C=0.001)),
])
rbf_kernel_svm_clf.fit(X_train, Y_train)
- Sigmoid
from sklearn.pipeline import pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.svm import SVC
sigmoid_kernel_svm_clf = Pipeline([
('scaler', StandardScaler()),
('sigmoid_svc', SVC(kernel='sigmoid')),
])
sigmoid_kernel_svm_clf.fit(X_train, Y_train)
Regression
Model
In an SVM regression problem, a model tries to keep as many points as possible inside a margin $\epsilon$.
Training
The optimization problem for SVM regression is the following:
To allow some points to be outside the margin, the optimization problem becomes:
Code
from sklearn import LinearSVR
svm_reg = LinearSVR(epsilon=1.5)
svm_reg.fit(X_train, Y_train)
from sklearn import SVR
svm_reg = SVR(kernel='poly', degree=2, c=100, epsilon=0.5)
svm_reg.fit(X_train, Y_train)