Decision Tree : SPLITTING CRITERIA

How Do We Choose the First Split in a Decision Tree?

When building a Decision Tree, the very first split is crucial because it sets the foundation for the rest of the tree. But how do we decide which feature should be used for that split?

The answer: we pick the feature that creates the purest (most homogeneous) child nodes after the split.


Homogeneous vs. Non-Homogeneous Nodes

  • Homogeneous Node:
    A node is called homogeneous if almost all the instances that fall into it belong to the same class.

    • Example: If 10 out of 10 (100%) or 9 out of 10 (90%) samples in a node belong to class C0, then the node is highly homogeneous.

    • Homogeneous nodes are desirable because they make clear, confident predictions.

  • Non-Homogeneous Node:
    A node is non-homogeneous when it contains a mixed distribution of classes.

    • Example: If 5 out of 10 samples belong to C0 and the other 5 belong to C1, the node is very non-homogeneous.

    • These nodes are less useful because the model is uncertain about the prediction.


Measures of Impurity:

Measure Definition / Idea Range Pros Cons Commonly Used In
Gini Index (GI) Probability of mislabeling a randomly chosen sample if labels are assigned by class distribution 0 (pure) to ~0.5 Simple, fast to compute, works well in practice Not as interpretable as entropy CART
Information Gain (IG) Reduction in entropy after a split 0 to 1 Clear information-theoretic meaning, widely known Biased toward features with many distinct values ID3, C4.5
Gain Ratio Normalized version of IG that penalizes high-branch splits 0 to 1 Avoids IG’s bias, more balanced feature selection Can undervalue features with fewer values C4.5
Misclassification Error Fraction of incorrect predictions at a node 0 (pure) to ~0.5 Intuitive and simple Less sensitive, not great for choosing splits Pruning phase




Comments