Understanding Support Vector Machines (SVM)

 

  The Math Behind It

Support Vector Machines (SVM) are one of the most powerful and popular algorithms in Machine Learning. Known for their ability to handle classification tasks with high accuracy, SVMs are widely used in text classification, bioinformatics, and image recognition.

But to truly understand SVMs, let’s break down the math step by step.


What is an SVM?

At its core, an SVM tries to find the best boundary (hyperplane) that separates data points of different classes.

  • For 2D data, this hyperplane is just a line.

  • For 3D data, it’s a plane.

  • In higher dimensions, it’s called a hyperplane.

The best hyperplane is the one that maximizes the margin - the distance between the hyperplane and the nearest data points (called support vectors).




The Math of SVM

1. The Hyperplane Equation

A hyperplane in n-dimensions can be written as:

wx+b=0w \cdot x + b = 0
  • w → weight vector (normal to the hyperplane)

  • → input vector

  • → bias term


2. The Classification Rule

For a data point xi:

yi=sign(wxi+b)y_i = \text{sign}(w \cdot x_i + b)
  • If result > 0 → class +1

  • If result < 0 → class -1


3. Margin Maximization

The margin is defined as:

Margin=2w\text{Margin} = \frac{2}{\|w\|}

SVM tries to maximize the margin, which means minimizing w.


4. The Optimization Problem

Formally, we solve:

min12w2\min \frac{1}{2}\|w\|^2

subject to:

yi(wxi+b)1iy_i (w \cdot x_i + b) \geq 1 \quad \

This is a convex optimization problem - which means there’s a unique global solution.


5. Soft Margin SVM

Real-world data isn’t always perfectly separable. That’s where slack variables (ξi\xi_i) come in, allowing misclassifications:

yi(wxi+b)1ξi,ξi0y_i (w \cdot x_i + b) \geq 1 - \xi_i, \quad \xi_i \geq 0

And the objective becomes:

min12w2+Cξi\min \frac{1}{2}\|w\|^2 + C \sum \xi_i

where C controls the tradeoff between maximizing margin and minimizing classification error.

Comments