How Do We Choose the First Split in a Decision Tree?

When building a Decision Tree, the very first split is crucial because it sets the foundation for the rest of the tree. But how do we decide which feature should be used for that split?

The answer: we pick the feature that creates the purest (most homogeneous) child nodes after the split.

Homogeneous vs. Non-Homogeneous Nodes

Homogeneous Node:
A node is called homogeneous if almost all the instances that fall into it belong to the same class.
- Example: If 10 out of 10 (100%) or 9 out of 10 (90%) samples in a node belong to class C0, then the node is highly homogeneous.
- Homogeneous nodes are desirable because they make clear, confident predictions.
Non-Homogeneous Node:
A node is non-homogeneous when it contains a mixed distribution of classes.
- Example: If 5 out of 10 samples belong to C0 and the other 5 belong to C1, the node is very non-homogeneous.
- These nodes are less useful because the model is uncertain about the prediction.

Measures of Impurity:

Measure	Definition / Idea	Range	Pros	Cons	Commonly Used In
Gini Index (GI)	Probability of mislabeling a randomly chosen sample if labels are assigned by class distribution	0 (pure) to ~0.5	Simple, fast to compute, works well in practice	Not as interpretable as entropy	CART
Information Gain (IG)	Reduction in entropy after a split	0 to 1	Clear information-theoretic meaning, widely known	Biased toward features with many distinct values	ID3, C4.5
Gain Ratio	Normalized version of IG that penalizes high-branch splits	0 to 1	Avoids IG’s bias, more balanced feature selection	Can undervalue features with fewer values	C4.5
Misclassification Error	Fraction of incorrect predictions at a node	0 (pure) to ~0.5	Intuitive and simple	Less sensitive, not great for choosing splits	Pruning phase

Machine Learning

Search This Blog

Decision Tree : SPLITTING CRITERIA

How Do We Choose the First Split in a Decision Tree?

Homogeneous vs. Non-Homogeneous Nodes

Measures of Impurity:

Comments

Post a Comment