SVM and the Magic of Kernels

 

SVM and Kernels

In Part 1, we saw how SVM separates classes with a hyperplane. But what if the data isn’t linearly separable?

Example: Imagine classifying points arranged in concentric circles. No straight line (hyperplane) can separate them.

This is where Kernels become powerful.


What is a Kernel?

A kernel is a mathematical function that allows SVM to work in higher-dimensional spaces without explicitly computing the coordinates in that space.

This trick is called the Kernel Trick.

Instead of mapping data explicitly into higher dimensions (ϕ(x)\phi(x), kernels compute the dot product in that space directly:

K(xi,xj)=ϕ(xi)ϕ(xj)K(x_i, x_j) = \phi(x_i) \cdot \phi(x_j)

This makes computation efficient and feasible even in very high dimensions.


Common Kernels

1. Linear Kernel

K(xi,xj)=xixjK(x_i, x_j) = x_i \cdot x_j
  • Used when data is linearly separable.

  • Fast and simple.


2. Polynomial Kernel

K(xi,xj)=(xixj+c)dK(x_i, x_j) = (x_i \cdot x_j + c)^d
  • Allows curved decision boundaries.

  • Degree dd controls flexibility.


3. Radial Basis Function (RBF) / Gaussian Kernel

K(xi,xj)=exp(γxixj2)K(x_i, x_j) = \exp(-\gamma \|x_i - x_j\|^2)
  • Most widely used kernel.

  • Handles complex, non-linear boundaries.

  • γ\gamma controls how far the influence of a point reaches.


4. Sigmoid Kernel

K(xi,xj)=tanh(αxixj+c)K(x_i, x_j) = \tanh(\alpha x_i \cdot x_j + c)
  • Inspired by neural networks (acts like an activation function).


Choosing the Right Kernel

  • Linear Kernel: when features are already separable or dataset is very large.

  • Polynomial Kernel: when interaction between features is important.

  • RBF Kernel: default choice when unsure, works well in most cases.

  • Sigmoid Kernel: rarely used, but works in some neural-network-like cases.

Comments