Entropy

 

Entropy: The Measure of Uncertainty

In information theory, entropy is a way to measure disorder, uncertainty, or impurity in a dataset.

  • If your dataset is pure (all data points belong to the same class), entropy = 0.

  • If your dataset is a 50/50 mix of classes, entropy is maximum (most uncertain).


Formula

For a dataset S with class probabilities p1,p2,,pnp_1, p_2, \dots, p_n:

H(S)=i=1npilog2(pi)H(S) = - \sum_{i=1}^n p_i \log_2(p_i)

  • pip_i= proportion of data points belonging to class ii.


EXAMPLE

Let’s say we have the dataset below :

gradebumpinessspeed limitspeed
sleepbumpyyesslow
steepsmoothyesslow
flatbumpynofast
steepsmoothnofast

Entropy is claculated :

Calculation




Comments