SVM and Kernels

In Part 1, we saw how SVM separates classes with a hyperplane. But what if the data isn’t linearly separable?

Example: Imagine classifying points arranged in concentric circles. No straight line (hyperplane) can separate them.

This is where Kernels become powerful.

What is a Kernel?

A kernel is a mathematical function that allows SVM to work in higher-dimensional spaces without explicitly computing the coordinates in that space.

This trick is called the Kernel Trick.

Instead of mapping data explicitly into higher dimensions ( $\phi(x)$ , kernels compute the dot product in that space directly:

K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j)

This makes computation efficient and feasible even in very high dimensions.

Common Kernels

1. Linear Kernel

K(x_i, x_j) = x_i \cdot x_j

Used when data is linearly separable.
Fast and simple.

2. Polynomial Kernel

K(x_i, x_j) = (x_i \cdot x_j + c)^d

Allows curved decision boundaries.
Degree $d$ controls flexibility.

3. Radial Basis Function (RBF) / Gaussian Kernel

K(x_i, x_j) = \exp(-\gamma \|x_i - x_j\|^2)

Most widely used kernel.
Handles complex, non-linear boundaries.
$\gamma$ controls how far the influence of a point reaches.

4. Sigmoid Kernel

K(x_i, x_j) = \tanh(\alpha x_i \cdot x_j + c)

Inspired by neural networks (acts like an activation function).

Choosing the Right Kernel

Linear Kernel: when features are already separable or dataset is very large.
Polynomial Kernel: when interaction between features is important.
RBF Kernel: default choice when unsure, works well in most cases.
Sigmoid Kernel: rarely used, but works in some neural-network-like cases.

Machine Learning

Search This Blog

SVM and the Magic of Kernels