Entropy: The Measure of Uncertainty
In information theory, entropy is a way to measure disorder, uncertainty, or impurity in a dataset.
-
If your dataset is pure (all data points belong to the same class), entropy = 0.
-
If your dataset is a 50/50 mix of classes, entropy is maximum (most uncertain).
Formula
For a dataset :
-
= proportion of data points belonging to class .
EXAMPLE
Let’s say we have the dataset below :
grade | bumpiness | speed limit | speed |
---|---|---|---|
sleep | bumpy | yes | slow |
steep | smooth | yes | slow |
flat | bumpy | no | fast |
steep | smooth | no | fast |
Entropy is claculated :
Comments
Post a Comment