Decision Tree : SPLITTING CRITERIA

How Do We Choose the First Split in a Decision Tree?

When building a Decision Tree, the very first split is crucial because it sets the foundation for the rest of the tree. But how do we decide which feature should be used for that split?

The answer: we pick the feature that creates the purest (most homogeneous) child nodes after the split.


Homogeneous vs. Non-Homogeneous Nodes

  • Homogeneous Node:
    A node is called homogeneous if almost all the instances that fall into it belong to the same class.

    • Example: If 10 out of 10 (100%) or 9 out of 10 (90%) samples in a node belong to class C0, then the node is highly homogeneous.

    • Homogeneous nodes are desirable because they make clear, confident predictions.

  • Non-Homogeneous Node:
    A node is non-homogeneous when it contains a mixed distribution of classes.

    • Example: If 5 out of 10 samples belong to C0 and the other 5 belong to C1, the node is very non-homogeneous.

    • These nodes are less useful because the model is uncertain about the prediction.


Measures of Impurity:


Measure

Definition / Idea

Range

Pros

Cons

Commonly Used In

Gini Index (GI)

Probability of mislabeling a randomly chosen sample if labels are assigned by class distribution

0 (pure) to ~0.5

Simple, fast to compute, works well in practice

Not as interpretable as entropy

CART

Information Gain (IG)

Reduction in entropy after a split

0 to 1

Clear information-theoretic meaning, widely known

Biased toward features with many distinct values

ID3, C4.5

Gain Ratio

Normalized version of IG that penalizes high-branch splits

0 to 1

Avoids IG’s bias, more balanced feature selection

Can undervalue features with fewer values

C4.5

Misclassification Error

Fraction of incorrect predictions at a node

0 (pure) to ~0.5

Intuitive and simple

Less sensitive, not great for choosing splits

Pruning phase


Comments