Decision Tree - Classification
Induction Algorithm
- Tree is constructed in a greedy, top-down,recursive, divide-and-conquer manner
- Build one node at a time, starting with the root
- At the start: use all data samples to determine best attribute for the root
- Based on that attribute, select data subset that goes to the children
- For each child, repeat the process
- Test attributes are selected based on a statistical measure (e.g., information gain)
Stopping Criteria- How do we know when to stop?
- All samples for a given node belong to the same class
- There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf
- There are no samples left
Hunt's Algorithm
General Procedure:
- If Dnode contains records that all belong the same class ynode, then node is a leaf node
- labeled as ynode
- If Dnode is an empty set, then node is a leaf node labeled by the default class, ydefault
- If Dnode contains records that belong to more than one class, use an attribute test to split
- the data into smaller subsets.
- Recursively apply the procedure to each subset.
Step-by-Step Procedure
Imagine you’re at a node in the tree with a group of training examples, which we’ll call Dnode.
-
All records belong to one class → Stop
-
If all the examples in Dnode are from the same class, then this node becomes a leaf node.
-
The leaf is labeled with that class.
-
Example: If all customers in this group “buy” a product, the leaf is labeled “Buy.”
-
-
No records left → Stop
-
If Dnode is empty (no examples reach this node), the algorithm creates a leaf node with a default class.
-
The default class is usually the majority class of the parent node.
-
-
Mixed records → Split further
-
If Dnode contains a mix of different classes, the algorithm chooses an attribute test (a question like “Is Age > 30?” or “Is Weather = Sunny?”).
-
This test splits the data into smaller subsets.
-
Each subset becomes a new child node.
-
-
Repeat the process
-
The algorithm is applied recursively to each child node, checking again: are all records the same class? Empty? Or mixed?
-
This continues until the stopping condition is reached (all leaves are pure or no data is left).
-
Comments
Post a Comment